CN112947581A - Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning - Google Patents
Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning Download PDFInfo
- Publication number
- CN112947581A CN112947581A CN202110318644.5A CN202110318644A CN112947581A CN 112947581 A CN112947581 A CN 112947581A CN 202110318644 A CN202110318644 A CN 202110318644A CN 112947581 A CN112947581 A CN 112947581A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- target
- uav
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 75
- 230000002787 reinforcement Effects 0.000 title claims abstract description 25
- 230000009471 action Effects 0.000 claims abstract description 64
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 28
- 230000008901 benefit Effects 0.000 claims abstract description 27
- 230000007246 mechanism Effects 0.000 claims abstract description 20
- 238000004088 simulation Methods 0.000 claims abstract description 13
- 238000005457 optimization Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 70
- 230000008569 process Effects 0.000 claims description 42
- 239000011159 matrix material Substances 0.000 claims description 19
- 238000004422 calculation algorithm Methods 0.000 claims description 18
- 238000011156 evaluation Methods 0.000 claims description 16
- 239000003795 chemical substances by application Substances 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000002457 bidirectional effect Effects 0.000 claims description 7
- 238000009825 accumulation Methods 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 5
- 230000001133 acceleration Effects 0.000 claims description 3
- 230000002238 attenuated effect Effects 0.000 claims description 3
- 230000002860 competitive effect Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000005484 gravity Effects 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 239000000523 sample Substances 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 230000006399 behavior Effects 0.000 abstract description 6
- 238000004891 communication Methods 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000002567 autonomic effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/104—Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses a multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning, which solves the autonomous decision problem of maneuver action in multi-unmanned aerial vehicle collaborative air combat in simulation multi-to-multi air combat. The method comprises the following steps: creating a motion model of the unmanned aerial vehicle platform; evaluating the situation of the multi-aircraft air combat based on attack areas, distances and angle factors, and analyzing the state space, action space and reward value of the maneuvering decision of the multi-aircraft air combat; a target distribution method and a strategy coordination mechanism in the cooperative air combat are designed, behavior feedback of each unmanned aerial vehicle in target distribution, situation advantages and safe collision avoidance is defined through distribution of reward values, and strategy cooperation is achieved after training. The invention can effectively improve the capability of multiple unmanned aerial vehicles for carrying out collaborative air combat maneuver autonomous decision, has stronger cooperativity and autonomous optimization, and continuously improves the decision level of unmanned aerial vehicle formation in continuous simulation and learning.
Description
Technical Field
The invention belongs to the technical field of unmanned aerial vehicles, and particularly relates to a multi-unmanned aerial vehicle collaborative air combat maneuver decision method.
Background
At present, unmanned aerial vehicles can complete tasks such as reconnaissance, monitoring and ground attack, and play an increasingly difficult role in modern war. But because of the restriction of intelligent level, unmanned aerial vehicle can not carry out autonomic air battle maneuver decision yet at present, especially the autonomic collaborative air battle of many unmanned aerial vehicles. Therefore, promote unmanned aerial vehicle's intelligent level, let unmanned aerial vehicle can be according to the situation environment and the maneuver in the automatic control command completion air battle is current main research direction.
The unmanned aerial vehicle can complete the maneuver autonomous decision of the air combat, and the essence of the maneuver autonomous decision is to complete the mapping from the air combat situation to the maneuver and execute the corresponding maneuver under different situations. Because the situation of air battle is more complex than other tasks, the situation space of the air battle task is difficult to be completely covered by a manual pre-programming method, and the optimal maneuver decision is more difficult to calculate and generate.
At present, the air combat maneuver decision research of unmanned aerial vehicles is developed under the single-machine confrontation scene of 1v1, and in the actual air combat, a plurality of unmanned aerial vehicles are basically formed into a formation cooperative combat. The multi-machine cooperative air combat relates to three aspects of air combat situation assessment, multi-target distribution and maneuver decision, the cooperative air combat is a closely-connected coupling process of the three parts, and compared with the maneuver decision of single-machine confrontation, the multi-machine cooperative air combat needs to consider tactical cooperation besides the enlargement of the force quantity scale, so that the problem is more complex.
The research on the multi-machine collaborative air combat decision can be divided into centralized type and distributed type, the centralized type is that a center calculates the actions of all unmanned aerial vehicles in a formation, and the models are complex and have the problems of high calculation difficulty and insufficient real-time performance. The idea of the distributed method is that each unmanned aerial vehicle in the formation calculates respective maneuvering action by itself on the basis of target allocation, so that the complexity of the model is reduced, and the cooperation of the formation task is realized through the target allocation. The existing distributed cooperative air combat decision-making method mostly adopts the conditions that target distribution is firstly carried out, and then many-to-many air combat is converted into one-to-one according to the target distribution result, so that the method cannot well play the tactical cooperation of multi-target attack capability and formation combat, and cannot achieve the effect of 1+1> 2.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning, and solves the autonomous decision problem of maneuver action in multi-unmanned aerial vehicle collaborative air combat in simulation multi-to-multi air combat. The method comprises the following steps: creating a motion model of the unmanned aerial vehicle platform; evaluating the situation of the multi-aircraft air combat based on attack areas, distances and angle factors, and analyzing the state space, action space and reward value of the maneuvering decision of the multi-aircraft air combat; a target distribution method and a strategy coordination mechanism in the cooperative air combat are designed, behavior feedback of each unmanned aerial vehicle in target distribution, situation advantages and safe collision avoidance is defined through distribution of reward values, and strategy cooperation is achieved after training. The invention can effectively improve the capability of multiple unmanned aerial vehicles for carrying out collaborative air combat maneuver autonomous decision, has stronger cooperativity and autonomous optimization, and continuously improves the decision level of unmanned aerial vehicle formation in continuous simulation and learning.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: establishing a multi-machine air combat environment model, and defining a state space, an action space and a reward value for each unmanned aerial vehicle to make a maneuver decision in the multi-machine cooperative air combat process;
step 1-1: in a ground coordinate system, an ox axis is taken as the true east, an oy axis is taken as the true north, and an oz axis is taken as the vertical direction; the motion model of the unmanned aerial vehicle in the ground coordinate system is shown as the formula (1):
in the ground coordinate system, the dynamic model of the unmanned aerial vehicle is shown as formula (2):
wherein (x, y, z) represents the position of the drone in the ground coordinate system, v represents the drone velocity,andrespectively representing the values of the speed v of the unmanned aerial vehicle on three coordinate axes of xyz; the flight path angle gamma represents an included angle between the speed v of the unmanned aerial vehicle and the horizontal plane o-x-y; the heading angle psi represents an included angle between a projection v' of the unmanned aerial vehicle speed v on an o-x-y plane and an oy axis, and g represents gravity acceleration; [ n ] ofx,nz,μ]Is a control variable, n, for controlling the unmanned aerial vehicle to maneuverxThe overload in the speed direction of the unmanned aerial vehicle represents the thrust and deceleration action of the unmanned aerial vehicle; n iszIndicating the overload of the pitching direction of the unmanned aerial vehicle, namely normal overload; μ is the roll angle around the drone velocity vector; through nxControlling the speed of the unmanned plane by nzAnd mu, controlling the direction of the speed vector of the unmanned aerial vehicle, and further controlling the unmanned aerial vehicle to perform maneuvering action;
step 1-2: setting the missile to have the tail attack capability only; in the interception area of the missile by vUAnd vTRepresenting the speed of the drone and the target, respectively; d is a distance vector and represents the position relation between the unmanned aerial vehicle and the target; alpha is alphaUAnd alphaTRespectively representing an included angle between the speed vector of the unmanned aerial vehicle and the distance vector D and an included angle between the target speed vector and the distance vector D;
setting the maximum interception distance of the missile as DmAngle of view ofThe interception area of the missile is a conical area omega; the maneuvering target of the unmanned aerial vehicle in the air war is to make the target enter the interception area omega of the unmanned aerial vehicleUSimultaneously, the unmanned aerial vehicle is prevented from entering a target interception area omegaT;
According to the definition of the missile interception area, if the target is in the interception area of the missile of the own partyAnd the area shows that the own party can launch weapons to attack the target, the own party is in advantage, and the advantage value eta of the unmanned aerial vehicle when the unmanned aerial vehicle intercepts the target is definedUComprises the following steps:
wherein (x)T,yT,zT) Representing position coordinates of the target; re is a positive number;
defining a dominance η obtained by a target intercepting unmanned aerial vehicleTComprises the following steps:
wherein (x)U,yU,zU) Representing position coordinates of the drone;
in the air battle, the advantage value eta obtained by the unmanned aerial vehicle based on the interception opportunityAIs defined as:
ηA=ηU-ηT (4)
defining an advantage value eta obtained based on angle parameters and distance parameters of two partiesBComprises the following steps:
the above formula shows that when the unmanned aerial vehicle tailors the target, the dominance value is eta B1 is ═ 1; when the unmanned aerial vehicle is tailed by the target, the advantage value is etaB-1; when the distance between the unmanned aerial vehicle and the target is larger than the maximum interception distance of the missile, the advantage value is attenuated according to an exponential function;
by integrating formulas (4) and (5), the situation assessment function eta of the air war in which the unmanned aerial vehicle is located is obtained as follows:
η=ηA+ηB (6)
step 1-3: the geometric relationship of the air combat situation at any moment is completely determined by information contained in an unmanned aerial vehicle position vector, an unmanned aerial vehicle speed vector, a target position vector and a target speed vector in the same coordinate system, so that the description of the air combat situation is composed of the following 5 aspects:
1) speed information of unmanned aerial vehicle, including speed magnitude vUTrack angle gammaUAnd heading angle psiU;
2) Speed information of the object, including the magnitude v of the speedTTrack angle gammaTAnd heading angle psiT;
3) The relative position relation between the unmanned aerial vehicle and the target is represented by a distance vector D; distance vector modulo D | | | D | |, γDRepresenting the angle of the distance vector D with the horizontal plane o-x-y,. phiDThe included angle between the projection vector of the distance vector D on the horizontal plane o-x-y and the oy axis is shown, and the relative position relation between the unmanned aerial vehicle and the target is D and gammaDAnd psiDRepresents;
4) the relative motion relation between the unmanned aerial vehicle and the target comprises an included angle alpha between a speed vector of the unmanned aerial vehicle and a distance vector DUAnd the angle alpha between the target velocity vector and the distance vector DT;
5) Height information z of unmanned aerial vehicleUAnd height information z of the targetT;
Based on the variables 1) to 5) above, the 1v1 air battle situation at any time can be completely characterized, so the state space of the 1v1 maneuver decision model is a 13-dimensional vector space s:
s=[vU,γU,ψU,vT,γT,ψT,D,γD,ψD,αU,αT,zU,zT] (7)
adopting a situation evaluation function eta as an air war maneuver decision reward value R, and reflecting the action of an action value on the air war situation through the situation evaluation function, wherein the R is eta;
step 1-4: in the multi-airplane air battle, the number of the unmanned aerial vehicles is set to be n and respectively recorded as UAVsi(i is 1,2, …, n), the number of targets is m, and each is denoted as Targetj(j ═ 1,2, …, m), the number of set targets is not greater than the number of drones, i.e. m ≦ n;
remember any two UAVsiAnd TargetjRelative state therebetween isUAViUAV for interacting with any one of the other aircraftkThe relative state between them is recorded asAny UAV in multi-aircraft air combatiThe observation state of (a) is:
Si=[∪sij|j=1,2...,m,∪sik|k=1,2,...,n(k≠i)] (8)
in the process of multi-machine air combat, each unmanned aerial vehicle makes own maneuvering decision according to the situation of the unmanned aerial vehicle in the air combat environment, and the unmanned aerial vehicle passes through n according to the unmanned aerial vehicle dynamics model shown in the formula (2)x、nzAnd μ three variables control flight, therefore UAViHas a motion space of Ai=[nxi,nzi,μi];
In the multi-machine collaborative air battle, the situation assessment value eta between each unmanned aerial vehicle and each target is respectively calculated according to the formula (4) and the formula (5)AAnd ηBRecording UAViAnd TargetjHas a situation evaluation value ofAndin addition to that, consider a UAViFriend machine UAVkThe relative state of (A) on the self-situation, thus defining the UAViFriend machine UAVkThe situation assessment function of (1) is:
wherein DikUAV for unmanned aerial vehicleiFriend machine UAVkA distance between, DsafeIs the minimum safe distance between two unmanned planes, and P is a positive number.
Step 2: establishing a multi-machine cooperative target distribution method, and determining a target distribution rule during reinforcement learning training;
step 2-1: in the air battle, n unmanned aerial vehicles are arranged to fight m targets, and n is more than or equal to m; according to equation (6), UAVi(i-1, 2, …, n) vs. TargetjThe situation evaluation value of (j ═ 1,2, …, m) is
Let the target allocation matrix be X ═ Xij],xij1 denotes TargetjTo UAVi,xij0 denotes TargetjNot allocated to UAVi(ii) a Let each drone be able to launch missiles at most simultaneously on L targets located in its attack zone, i.e.Meanwhile, during the battle, the target is prevented from being omitted and the attack is abandoned, namely, each target is at least allocated with one unmanned aerial vehicle to attack, so that the unmanned aerial vehicle can be used for preventing the attackAll unmanned aerial vehicles are required to be put into combat, so that
With the situation advantage maximization of the unmanned aerial vehicle to the target as a target, establishing a target distribution model as follows:
step 2-2: in the target allocation process, targets in an attack area are allocated firstly, and then targets outside the attack area are allocated, so that the target allocation method is divided into the following two parts:
step 2-2-1: preferentially distributing targets located in the attack area;
to be provided withAndconstructing two n x m dimensional matrices H for elementsAAnd HB, From equation (3) if TargetjIn the UAViIn the attack area of (1), thenOtherwiseThus, letOrder toX of corresponding positions of all zero elements ij1 is ═ 1; during the distribution process, if at unmanned aerial vehicle UAViThe target number x in the attack area exceeds the maximum attack target number of the unmanned aerial vehicle, namely x>L, then use UAViAt HBSorting corresponding element values in the matrix, selecting L targets with the maximum element values to be allocated to the UAVi;
Step 2-2-2: allocating targets located outside the attack area;
for UAViIf a target within its attack zone has already been allocated, it can no longer be allocated a target outside the attack zone; and for a plurality of targets outside the attack area, the unmanned aerial vehicle cannot makeManeuvering to enable a plurality of targets to be in the attack area, and therefore when the targets are outside the attack area, only one target can be allocated to the unmanned aerial vehicle; therefore, after the target allocation in the attack area is completed, the remaining target allocation work is changed into a process of allocating 1 target to the unallocated unmanned aerial vehicle, and the allocation is realized by adopting the hungarian algorithm, which specifically comprises the following steps:
first, a matrix X is allocated according to the current target [ X ]ij]n×mIs prepared from HBAll of x inijDeleting the ith row and the jth column where the 1 is positioned to obtain a matrixBased onThe allocation result is calculated by adopting the Hungarian algorithm, because n is more than or equal to m, and L>0, adopting a margin complementing method to complete the Hungarian algorithm, realizing target distribution, and ordering corresponding xij=1;
After the above two steps are completed, the allocation of all the targets is completed, and a target allocation matrix X ═ X is obtainedij]n×m;
And step 3: designing a multi-machine cooperative maneuver strategy learning algorithm and determining a reinforcement learning training logic;
the multi-machine cooperative maneuver strategy learning algorithm comprises a strategy coordination mechanism and a strategy learning mechanism:
step 3-1: designing a strategy coordination mechanism;
the air combat confrontation is regarded as a competitive game between n unmanned aerial vehicles and m targets, a model is established based on a framework of a random game, and one random game can use one tupleTo represent; s represents the state space of the current game, and all agents can be shared; UAViIs defined as Ai,TargetiIs defined as Bi;T:S×An×Bm→ S denotes the deterministic transfer function of the environment, representing a UAViA reward value function of; the action spaces of drones in respective convoy in the collaborative air battle are the same, i.e. for UAViAnd TargetjRespectively have AiA and Bi=B;
Defining the global reward value of the formation of the unmanned aerial vehicles as the average value of the reward values of the unmanned aerial vehicles, namely:
wherein r (s, a, b) represents that at the time t, the environment state is s, and the unmanned aerial vehicles form to take action a belongs to AnThe target formation takes action B ∈ BmUnder the condition of (1), the unmanned aerial vehicles form the acquired reward value;
the goal of drone formation is to learn a strategy to make the expectation of a discount accumulation of reward valuesMaximum, wherein 0<λ ≦ 1 is a discount factor; the random game is transformed into a markov decision problem:
wherein Q*(. cndot.) represents a state-action value function for executing action a in state s, r (s, a) represents a reward value for executing action a in state s, θ represents a network parameter of the policy function, s' represents a state at a next time, aθRepresenting a parameterized policy function;
the reward value function for each drone is defined as:
wherein r isi(s, a, b) represents that at the time t, the environment state is s, and the unmanned aerial vehicle formation takes action as a belongs to AnThe target formation takes action B ∈ BmIn case of (A), UAViThe value of the prize earned, whereinCharacterizing UAVsiRelative to the situational dominance value of the target assigned to it,is a penalty term to constrain the UAViDistance from friend machine;
based on the formula (13), for n unmanned aerial vehicle individuals, there are n bellman equations shown as the formula (14), wherein the policy function aθWith the same parameters θ:
wherein the content of the first and second substances,UAV representing an unmanned aerial vehicleiState-action value function for performing action a in state s, ri(s, a) denotes unmanned aerial vehicle UAViA reward value obtained by performing action a in state s;
step 3-2: designing a strategy learning mechanism;
establishing a multi-unmanned aerial vehicle maneuvering decision model by adopting a bidirectional circulation neural network BRNN;
the multi-unmanned aerial vehicle air combat maneuver decision model consists of an Actor network and a Critic network, wherein the Actor network is formed by connecting Actor networks of all unmanned aerial vehicle individuals through BRNN, and the Critic network is formed by connecting Critic networks of all unmanned aerial vehicle individuals through BRNN; setting hidden layers in strategy networks Actor and Q networks Critic in a single unmanned aerial vehicle decision model into a BRNN circulating unit in a multi-unmanned aerial vehicle air combat maneuver decision model, and then expanding the BRNN according to the number of unmanned aerial vehicles; the input of the multi-unmanned aerial vehicle air combat maneuver decision model is the current air combat situation, and the action value of each unmanned aerial vehicle is output;
defining UAVsiHas an objective function ofRepresenting individual prize values riThe expectation of the accumulation of (a) is,indicating the adoption of an action policy a under a state transition function TθThe obtained state distribution is stable in the traversal markov decision process, so that the target functions of n unmanned planes are recorded as J (theta):
according to the multi-agent deterministic policy gradient theorem, for the target function J (theta) of the n drones described in equation (15), the gradient of the policy network parameter theta is
Using parameterized Critic function Qξ(s, a) to estimate the state-action function in equation (16)When the Critic is trained, a square sum loss function is adopted to calculate a parameterized Critic function QξThe gradient of (s, a) is shown as equation (17), where ξ is a parameter of the Q network:
optimizing the Actor and Critic networks by adopting a random gradient descent method based on the formulas (16) and (17); in the interactive learning process, the learning optimization of the collaborative air combat strategy is completed through the data updating parameters obtained by trial and error;
step 3-3: according to the strategy coordination mechanism and the strategy learning mechanism, the reinforcement learning training process for determining the multi-unmanned aerial vehicle collaborative air combat maneuver decision model is as follows:
step 3-3-1: firstly, initialization is carried out: determining the forces and situations of both air combat parties, and arranging n unmanned aerial vehicles and m targets for air combat confrontation, wherein n is more than or equal to m; randomly initializing an online network parameter theta of the Actor and a parameter xi of the online network of Critic, and then respectively assigning the parameters of the Actor and the Critic online networks to the parameters of corresponding target networks, namely theta '← theta, xi' ← xi, theta 'and xi' are the parameters of the Actor and the Critic target networks respectively; initializing an experience pool R1The system is used for storing experience data obtained by probe interaction; initializing a random process epsilon for realizing the exploration of action values;
step 3-3-2: determining an initial state of training, namely determining the relative situation of two parties at the beginning of an air battle; setting initial position information and speed information of each unmanned aerial vehicle in the unmanned aerial vehicle formation and target formation, namely determining (x, y, z, v, gamma, psi) information of each unmanned aerial vehicle, and calculating to obtain an air war initial state s according to the definition of a state space1(ii) a Let t equal 1;
step 3-3-3: and (3) repeatedly carrying out multi-screen training according to the initial state, and executing the following operations in each single-screen air combat simulation:
firstly according to the current air war state stCalculating a target distribution matrix X based on the target distribution methodt(ii) a Then each UAViAccording to state stAnd a random process ∈ generating action valuesAnd executing, at the same time, each Target in the Target formationiPerforming an actionState transition to s after executiont+1Calculating the value of the prize award according to equation (13)Transfer a process variableStored as a piece of experience data in an experience pool R1Performing the following steps; during learning, from experience pool R1Randomly sampling a batch of M pieces of empirical dataCalculating the target Q value of each unmanned aerial vehicle, namely for each piece of M data, the following steps are carried out:
the gradient estimate for Critic was calculated according to equation (17) as follows:
the gradient estimation value of Actor is calculated according to equation (16) as follows:
updating online network parameters of Actor and Critic by adopting an optimizer according to the obtained gradient estimation values delta xi and delta theta; after the online network optimization is completed, the target network parameters are updated in a soft update mode, namely
Wherein κ ∈ (0, 1);
step 3-3-4: and after the single-screen simulation is finished, if the simulation reaches the set maximum screen number, stopping the reinforcement learning training, otherwise adding 1 to t, and repeatedly executing the step 3-3-3.
The invention has the following beneficial effects:
the invention is based on a multi-agent reinforcement learning method, establishes a method for generating a multi-unmanned aerial vehicle cooperative air combat maneuver decision strategy, adopts a bidirectional cyclic neural network to establish a communication network, connects discrete unmanned aerial vehicles into a formation cooperative decision network, establishes a multi-unmanned aerial vehicle cooperative air combat maneuver decision model under an Actor-critic architecture, and realizes the unification of the learning of individual behaviors of the unmanned aerial vehicles and the overall combat target of the formation. Different from the mode that a multi-airplane air battle is decomposed into a plurality of 1v1 air battles, the multi-unmanned-airplane collaborative air battle maneuver decision model established by the invention can obtain collaborative air battle maneuver strategies through autonomous learning, and tactical coordination is realized in the air battle process, so that the situation advantage of the whole formation operation is achieved and opponents are defeated.
Drawings
FIG. 1 is a three-degree-of-freedom particle motion model of the unmanned aerial vehicle.
FIG. 2 is a one-to-one close-up air combat situation diagram of the present invention.
FIG. 3 is a diagram showing the result of the maneuver decision of the UAV under the condition of uniform velocity and linear flight.
FIG. 4 is a model structure of the multi-unmanned aerial vehicle collaborative air combat maneuver decision based on the bidirectional cyclic neural network.
FIG. 5 is a schematic diagram of an air combat simulated maneuver trajectory based on learned strategies after training is completed.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention aims to provide a method for generating a multi-unmanned aerial vehicle collaborative air combat autonomous maneuver decision based on multi-agent reinforcement learning.
The invention realizes the consistency of the state understanding of the unmanned aerial vehicles through the communication network. According to the characteristics of multi-target attack, the reinforcement learning reward value of each unmanned aerial vehicle is calculated by combining target distribution and the air combat situation evaluation value, and the individual reinforcement learning process is guided through the reward of each unmanned aerial vehicle, so that the tactical targets of the formation are closely combined with the learning target of a single unmanned aerial vehicle, and a collaborative tactical maneuver strategy is generated. Tactical coordination is realized in the air combat process, the situation advantage of the whole formation combat is achieved, and the opponents are combed.
A multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning comprises the following steps:
step 1: establishing a multi-machine air combat environment model, and defining a state space, an action space and a reward value for each unmanned aerial vehicle to make a maneuver decision in the multi-machine cooperative air combat process;
step 1-1: in a ground coordinate system, an ox axis is taken as the true east, an oy axis is taken as the true north, and an oz axis is taken as the vertical direction; the motion model of the unmanned aerial vehicle in the ground coordinate system is shown as the formula (1):
in the ground coordinate system, the dynamic model of the unmanned aerial vehicle is shown as formula (2):
wherein (x, y, z) represents the position of the drone in the ground coordinate system, v represents the drone velocity,andrespectively representing the values of the speed v of the unmanned aerial vehicle on three coordinate axes of xyz; the flight path angle gamma represents an included angle between the speed v of the unmanned aerial vehicle and the horizontal plane o-x-y; the heading angle psi represents an included angle between a projection v' of the unmanned aerial vehicle speed v on an o-x-y plane and an oy axis, and g represents gravity acceleration; [ n ] ofx,nz,μ]Is a control variable, n, for controlling the unmanned aerial vehicle to maneuverxThe overload in the speed direction of the unmanned aerial vehicle represents the thrust and deceleration action of the unmanned aerial vehicle; n iszIndicating the overload of the pitching direction of the unmanned aerial vehicle, namely normal overload; μ is the roll angle around the drone velocity vector; through nxControlling the speed of the unmanned plane by nzAnd mu, controlling the direction of the speed vector of the unmanned aerial vehicle, and further controlling the unmanned aerial vehicle to perform maneuvering action; as shown in fig. 1;
step 1-2: setting the missile to have the tail attack capability only; in the interception area of the missile by vUAnd vTRepresenting the speed of the drone and the target, respectively; d is a distance vector and represents the position relation between the unmanned aerial vehicle and the target; alpha is alphaUAnd alphaTRespectively representing an included angle between the speed vector of the unmanned aerial vehicle and the distance vector D and an included angle between the target speed vector and the distance vector D;
setting the maximum interception distance of the missile as DmAngle of view ofThe interception area of the missile is a conical area omega; the maneuvering target of the unmanned aerial vehicle in the air war is to make the target enter the interception area omega of the unmanned aerial vehicleUSimultaneously, the unmanned aerial vehicle is prevented from entering a target interception area omegaT;
According to the definition of the missile interception area, if the target is in the interception area of the missile of the own party, the fact that the own party can launch a weapon to attack the target and the own party is in advantage is shown, and the advantage value eta when the unmanned aerial vehicle intercepts the target is definedUComprises the following steps:
wherein (x)T,yT,zT) Representing position coordinates of the target; re represents a large positive number, and can be manually adjusted according to the training effect to guide the training effect of the model;
defining dominance values obtained by target intercepting unmanned aerial vehicleηTComprises the following steps:
wherein (x)U,yU,zU) Representing position coordinates of the drone;
in the air battle, the advantage value eta obtained by the unmanned aerial vehicle based on the interception opportunityAIs defined as:
ηA=ηU-ηT (4)
in addition, in the air battle, because the field angle of an aerogun and some missiles is small, the launching condition can be formed only under the condition of tailgating, the requirement on the angle relationship is severe, and the dominant value eta obtained based on the angle parameter and the distance parameter of the aerogun and the missiles is definedBComprises the following steps:
the above formula shows that when the unmanned aerial vehicle tailors the target, the dominance value is eta B1 is ═ 1; when the unmanned aerial vehicle is tailed by the target, the advantage value is etaB-1; when the distance between the unmanned aerial vehicle and the target is larger than the maximum interception distance of the missile, the advantage value is attenuated according to an exponential function;
by integrating formulas (4) and (5), the situation assessment function eta of the air war in which the unmanned aerial vehicle is located is obtained as follows:
η=ηA+ηB (6)
step 1-3: the state of the air combat maneuver decision model is composed of a set of variables capable of completely describing the air combat situation, as shown in fig. 2, the geometric relationship of the air combat situation at any moment is completely determined by the information contained in the unmanned aerial vehicle position vector, the unmanned aerial vehicle speed vector, the target position vector and the target speed vector in the same coordinate system, so that the description of the air combat situation is composed of the following 5 aspects:
1) speed information of unmanned aerial vehicle, including speed magnitude vUTrack angle gammaUAnd heading angle psiU;
2) Speed information of the object, including the magnitude v of the speedTTrack angle gammaTAnd heading angle psiT;
3) The relative position relation between the unmanned aerial vehicle and the target is represented by a distance vector D; distance vector modulo D | | | D | |, γDRepresenting the angle of the distance vector D with the horizontal plane o-x-y,. phiDThe included angle between the projection vector of the distance vector D on the horizontal plane o-x-y and the oy axis is shown, and the relative position relation between the unmanned aerial vehicle and the target is D and gammaDAnd psiDRepresents;
4) the relative motion relation between the unmanned aerial vehicle and the target comprises an included angle alpha between a speed vector of the unmanned aerial vehicle and a distance vector DUAnd the angle alpha between the target velocity vector and the distance vector DT;
5) Height information z of unmanned aerial vehicleUAnd height information z of the targetT;
Based on the variables 1) to 5) above, the 1v1 air battle situation at any time can be completely characterized, so the state space of the 1v1 maneuver decision model is a 13-dimensional vector space s:
s=[vU,γU,ψU,vT,γT,ψT,D,γD,ψD,αU,αT,zU,zT] (7)
adopting a situation evaluation function eta as an air war maneuver decision reward value R, and reflecting the action of an action value on the air war situation through the situation evaluation function, wherein the R is eta;
step 1-4: in the multi-airplane air battle, the number of the unmanned aerial vehicles is set to be n and respectively recorded as UAVsi(i is 1,2, …, n), the number of targets is m, and each is denoted as Targetj(j ═ 1,2, …, m), the number of set targets is not greater than the number of drones, i.e. m ≦ n;
as shown in FIG. 3, as the number of drones and targets increases in a multi-aircraft air battle, each drone needs to take maneuver decisions into consideration with other stationsThere is the relative state of the drone (target versus friend). The relative situation of a drone and another drone in an air battle can be fully described by the 13 variables described in equation (7). Remember any two UAVsiAnd TargetjRelative state therebetween isUAViUAV for interacting with any one of the other aircraftkThe relative state between them is recorded asAny UAV in multi-aircraft air combatiThe observation state of (a) is:
Si=[∪sij|j=1,2...,m,∪sik|k=1,2,...,n(k≠i)] (8)
in the process of multi-machine air combat, each unmanned aerial vehicle makes own maneuvering decision according to the situation of the unmanned aerial vehicle in the air combat environment, and the unmanned aerial vehicle passes through n according to the unmanned aerial vehicle dynamics model shown in the formula (2)x、nzAnd μ three variables control flight, therefore UAViHas a motion space of Ai=[nxi,nzi,μi];
In the multi-machine collaborative air battle, the situation assessment value eta between each unmanned aerial vehicle and each target is respectively calculated according to the formula (4) and the formula (5)AAnd ηBRecording UAViAnd TargetjHas a situation evaluation value ofAndin addition to that, consider a UAViFriend machine UAVkIf the distance from the friend aircraft is too close, the risk of collision is increased, so that the UAV is definediFriend machine UAVkThe situation assessment function of (1) is:
wherein DikUAV for unmanned aerial vehicleiFriend machine UAVkA distance between, DsafeFor the minimum safe distance between two drones, P is a large positive number.
Step 2: establishing a multi-machine cooperative target distribution method, and determining a target distribution rule during reinforcement learning training;
in the multi-machine cooperative air combat, from the overall perspective of the air combat, the maximum advantage of the formation of the unmanned aerial vehicles in the air combat means that each enemy plane can be attacked by the weapon of the unmanned aerial vehicle, but each unmanned aerial vehicle can only maneuver against one target at the same time, so that the multi-machine cooperative air combat also needs to carry out target distribution at the same time when maneuvering decision is carried out, and cooperation of tactical strategies is realized.
Step 2-1: in the air battle, n unmanned aerial vehicles are arranged to fight m targets, and n is more than or equal to m; according to equation (6), UAVi(i-1, 2, …, n) vs. TargetjThe situation evaluation value of (j ═ 1,2, …, m) is
Let the target allocation matrix be X ═ Xij],x ij1 denotes TargetjTo UAVi,xij0 denotes TargetjNot allocated to UAVi(ii) a In the process of multi-aircraft air combat, the situation that a plurality of targets are simultaneously in the attack area of one unmanned aerial vehicle exists, so that the multi-target attack capability of the unmanned aerial vehicle needs to be considered in target distribution, and each unmanned aerial vehicle is designed to be capable of launching missiles to L targets in the attack area at most, namelyMeanwhile, during the battle, the target is prevented from being omitted and the attack is abandoned, namely, each target is at least allocated with one unmanned aerial vehicle to attack, so that the unmanned aerial vehicle can be used for preventing the attackAll unmanned aerial vehicles are required to be put into combat, so that
With the situation advantage maximization of the unmanned aerial vehicle to the target as a target, establishing a target distribution model as follows:
step 2-2: the unmanned aerial vehicle performs a series of maneuvers in the air war to enable a target to enter an attack area and launch weapons to the target, the target in the attack area is firstly distributed in the target distribution process, and then the target outside the attack area is distributed, so that the target distribution method is divided into the following two parts:
step 2-2-1: preferentially distributing targets located in the attack area;
to be provided withAndconstructing two n x m dimensional matrices H for elementsAAnd HB, From equation (3) if TargetjIn the UAViIn the attack area of (1), thenOtherwiseThus, letOrder toX of corresponding positions of all zero elements ij1 is ═ 1; during the distribution process, if at unmanned aerial vehicle UAViThe target number x in the attack area exceeds the maximum attack target number of the unmanned aerial vehicle, namely x>L, then use UAViAt HBSorting corresponding element values in the matrix, selecting L targets with the maximum element values to be allocated to the UAVi;
Step 2-2-2: allocating targets located outside the attack area;
for UAViIf a target within its attack zone has already been allocated, it can no longer be allocated a target outside the attack zone; for a plurality of targets outside the attack area, the unmanned aerial vehicle cannot maneuver so that the targets are in the attack area, and therefore when the targets are outside the attack area, only one target can be allocated to the unmanned aerial vehicle; therefore, after the target allocation in the attack area is completed, the remaining target allocation work is changed into a process of allocating 1 target to the unallocated unmanned aerial vehicle, and the allocation is realized by adopting the hungarian algorithm, which specifically comprises the following steps:
first, a matrix X is allocated according to the current target [ X ]ij]n×mIs prepared from HBAll of x inijDeleting the ith row and the jth column where the 1 is positioned to obtain a matrixBased onThe allocation result is calculated by adopting the Hungarian algorithm, because n is more than or equal to m, and L>0, adopting a margin complementing method to complete the Hungarian algorithm, realizing target distribution, and ordering corresponding xij=1;
After the above two steps are completed, the allocation of all the targets is completed, and a target allocation matrix X ═ X is obtainedij]n×m;
And step 3: designing a multi-machine cooperative maneuver strategy learning algorithm and determining a reinforcement learning training logic;
the multi-machine cooperative maneuver strategy learning algorithm comprises a strategy coordination mechanism and a strategy learning mechanism:
step 3-1: designing a strategy coordination mechanism;
the air combat confrontation is regarded as a competitive game between n unmanned aerial vehicles and m targets, a model is established based on a framework of a random game, and one random game can use one tupleTo represent; s represents the state space of the current game, and all agents can be shared; UAViIs defined as Ai,TargetiIs defined as Bi;T:S×An×Bm→ S denotes the deterministic transfer function of the environment, representing a UAViA reward value function of; the action spaces of drones in respective convoy in the collaborative air battle are the same, i.e. for UAViAnd TargetjRespectively have AiA and Bi=B;
Whether the unmanned aerial vehicle is superior in confrontation in the collaborative air battle is evaluated according to the situation of all the unmanned aerial vehicles. Defining the global reward value of the formation of the unmanned aerial vehicles as the average value of the reward values of the unmanned aerial vehicles, namely:
wherein r (s, a, b) represents that at the time t, the environment state is s, and the unmanned aerial vehicles form to take action a belongs to AnThe target formation takes action B ∈ BmUnder the condition of (1), the unmanned aerial vehicles form the acquired reward value;
the goal of drone formation is to learn a strategy to make the expectation of a discount accumulation of reward valuesMaximum, wherein 0<λ ≦ 1 is a discount factor; the random game is transformed into a markov decision problem:
wherein Q*(. cndot.) represents a state-action value function for executing action a in state s, r (s, a) represents a reward value for executing action a in state s, θ represents a network parameter of the policy function, s' represents a state at a next time, aθRepresenting a parameterized policy function;
the global reward value defined by the formula (11) can reflect the situation of the whole formation of the unmanned aerial vehicles, but the global reward value cannot reflect the action of the unmanned aerial vehicle individuals in the formation cooperation. In fact, global coordination is driven by the goals of each individual, and therefore, the reward value function for each drone is defined as:
wherein r isi(s, a, b) represents that at the time t, the environment state is s, and the unmanned aerial vehicle formation takes action as a belongs to AnThe target formation takes action B ∈ BmIn case of (A), UAViThe value of the prize earned, whereinCharacterizing UAVsiRelative to the situational dominance value of the target assigned to it,is a penalty term to constrain the UAViDistance from friend machine;
based on the formula (13), for n unmanned aerial vehicle individuals, there are n bellman equations shown as the formula (14), wherein the policy function aθWith the same parameters θ:
wherein the content of the first and second substances,UAV representing an unmanned aerial vehicleiState-action value function for performing action a in state s, ri(s, a) denotes unmanned aerial vehicle UAViA reward value obtained by performing action a in state s;
in the learning and training process, behavior feedback of each unmanned aerial vehicle in target distribution, situation advantages and safety collision avoidance is defined through distribution of reward values, strategy cooperation is achieved after training, behavior of each unmanned aerial vehicle can be acquiescent with behaviors of other friends, and centralized target distribution is not needed.
Step 3-2: designing a strategy learning mechanism;
the premise of realizing collective cooperation based on multi-agent reinforcement learning is that information interaction among individuals, so that a bidirectional cyclic neural network BRNN is adopted to establish a multi-unmanned aerial vehicle maneuvering decision model, the information interaction among unmanned aerial vehicles is ensured, and the coordination of a formation maneuvering strategy is realized;
the model is established as shown in fig. 4, the multi-unmanned aerial vehicle air combat maneuver decision model is composed of an Actor network and a criticic network, wherein the Actor network is formed by connecting Actor networks of all unmanned aerial vehicle individuals through BRNN, and the criticic network is formed by connecting criticic networks of all unmanned aerial vehicle individuals through BRNN; setting hidden layers in strategy networks Actor and Q networks Critic in a single unmanned aerial vehicle decision model into a BRNN circulating unit in a multi-unmanned aerial vehicle air combat maneuver decision model, and then expanding the BRNN according to the number of unmanned aerial vehicles; the input of the air combat maneuver decision model of the multiple unmanned aerial vehicles is the current air combat situation, and action values of all the unmanned aerial vehicles are output;
since the model is built based on BRNN, it is learned for network parametersThe idea is to expand the network into n (number of drones) sub-networks to calculate the inverse gradient and then update the network parameters using a time-based back propagation algorithm. Gradient at Q of each individual droneiThe functions and the strategy functions are propagated, and when the model is learned, the individual reward value of each unmanned aerial vehicle influences the action of each unmanned aerial vehicle, so that the generated gradient information is reversely propagated, and the model parameters are updated.
Defining UAVsiHas an objective function ofRepresenting individual prize values riThe expectation of the accumulation of (a) is,indicating the adoption of an action policy a under a state transition function TθThe obtained state distribution is generally stable in the traversal markov decision process, so the target functions of n unmanned planes are recorded as J (theta):
according to the multi-agent deterministic policy gradient theorem, for the target function J (theta) of the n drones described in equation (15), the gradient of the policy network parameter theta is
Using parameterized Critic function Qξ(s, a) to estimate the state-action function in equation (16)When the Critic is trained, a square sum loss function is adopted to calculate a parameterized Critic function QξThe gradient of (s, a) is shown as equation (17), where ξ is a parameter of the Q network:
optimizing the Actor and Critic networks by adopting a random gradient descent method based on the formulas (16) and (17); in the interactive learning process, the learning optimization of the collaborative air combat strategy is completed through the data updating parameters obtained by trial and error;
step 3-3: according to the strategy coordination mechanism and the strategy learning mechanism, the reinforcement learning training process for determining the multi-unmanned aerial vehicle collaborative air combat maneuver decision model is as follows:
step 3-3-1: firstly, initialization is carried out: determining the forces and situations of both air combat parties, and arranging n unmanned aerial vehicles and m targets for air combat confrontation, wherein n is more than or equal to m; randomly initializing an online network parameter theta of the Actor and a parameter xi of the online network of Critic, and then respectively assigning the parameters of the Actor and the Critic online networks to the parameters of corresponding target networks, namely theta '← theta, xi' ← xi, theta 'and xi' are the parameters of the Actor and the Critic target networks respectively; initializing an experience pool R1The system is used for storing experience data obtained by probe interaction; initializing a random process epsilon for realizing the exploration of action values;
step 3-3-2: determining an initial state of training, namely determining the relative situation of two parties at the beginning of an air battle; setting initial position information and speed information of each unmanned aerial vehicle in the unmanned aerial vehicle formation and target formation, namely determining (x, y, z, v, gamma, psi) information of each unmanned aerial vehicle, and calculating to obtain an air war initial state s according to the definition of a state space1(ii) a Let t equal 1;
step 3-3-3: and (3) repeatedly carrying out multi-screen training according to the initial state, and executing the following operations in each single-screen air combat simulation:
firstly according to the current air war state stCalculating a target distribution matrix X based on the target distribution methodt(ii) a Then each UAViAccording to state stAnd a random process ∈ generating action valuesAnd executing, at the same time, each Target in the Target formationiPerforming an actionState transition to s after executiont+1Calculating the value of the prize award according to equation (13)Transfer a process variableStored as a piece of experience data in an experience pool R1Performing the following steps; during learning, from experience pool R1Randomly sampling a batch of M pieces of empirical dataCalculating target Q values for individual drones, i.e. for each of the M pieces of data, there is
The gradient estimate for Critic was calculated according to equation (17) as follows:
the gradient estimation value of Actor is calculated according to equation (16) as follows:
updating online network parameters of Actor and Critic by adopting an optimizer according to the obtained gradient estimation values delta xi and delta theta; after the online network optimization is completed, the target network parameters are updated in a soft update mode, namely
Wherein κ ∈ (0, 1);
step 3-3-4: and after the single-screen simulation is finished, if the simulation reaches the set maximum screen number, stopping the reinforcement learning training, otherwise adding 1 to t, and repeatedly executing the step 3-3-3.
The specific embodiment is as follows:
the method is used for unmanned aerial vehicle dual-machine formation, and specifically comprises the following steps:
1. and designing a multi-machine air combat environment model.
In the multi-airplane air battle, the number of the unmanned aerial vehicles is set to be 2 and respectively recorded as UAVsi(i is 1,2) and the number of targets is 2, each of which is denoted as Targetj(j=1,2)。
Calculating according to the step 1 to obtain any UAViIs observed state Si;
In the process of multi-airplane air combat, each unmanned aerial vehicle makes own maneuvering decision according to the situation of the unmanned aerial vehicle in the air combat environment, and the unmanned aerial vehicle passes through n according to the unmanned aerial vehicle dynamics model shown in the formula (2)x,nzAnd μ three variables control flight, therefore UAViHas a motion space of Ai=[nxi,nzi,μi]。
In the multi-machine collaborative air battle, the situation assessment value eta between each unmanned aerial vehicle and each target is respectively calculated according to the formula (4) and the formula (5)AAnd ηBRecording UAViAnd TargetjHas a situation evaluation value ofAndin addition to this, UAVs should also be considerediFriend machine UAVkIf the distance from the friend aircraft is too close, the risk of collision is increased, so that the UAV is definediFriend machine UAVkThe evaluation function of (2) is shown in equation (9).
2. Designing a multi-machine cooperative target distribution method.
Two unmanned aerial vehicles fight 2 targets. UAV according to formula (6)i(i ═ 1,2) relative TargetjThe situation evaluation value of (j ═ 1,2) is
Obtaining the target distribution matrix X ═ X from the step 2ij]n×m。
3. And designing a multi-machine cooperative maneuver strategy learning algorithm.
The unmanned aerial vehicle is subjected to reinforcement learning training in an air battle scene that the unmanned aerial vehicle and the target aircraft fly in opposite directions and the target flies in uniform-speed linear motion.
The air war background of the multi-unmanned aerial vehicle collaborative air war is set to be a short distance air war, and the parameters of the air war environment model are set as follows. Maximum interception distance D of missilemax3km, field of view angleMinimum safe distance D between two unmanned aerial vehiclessafe200m, the dominance value Re of the target acquisition is 5, the penalty value P is 10, and the maximum speed v is set in the motion model of the unmanned aerial vehiclemax400m/s, minimum velocity vminControl parameter n of 90m/sx∈[-1,2],nz∈[0,8],μ∈[-π,π]。
The Actor network of the maneuvering decision model is divided into an input layer, a hidden layer and an output layer, wherein the input layer inputs an air combat state, the hidden layer is divided into 2 layers, the 1 st layer is composed of 400 LSTM neurons in the forward direction and the reverse direction respectively, the layer is expanded according to the number of unmanned aerial vehicles and a bidirectional circulation neural network structure to form a communication layer, the 2 nd layer is composed of 100 neurons, a tanh activation function is adopted, and parameters are uniformly distributed [ -3 × 10 [-4,3×10-4]Random initialization, outputting 3 control quantities by output layer, adopting tanh activating function, uniformly distributing parameter [ -2 × 10 [)-5,2×10-5]Random initialization, output range [0,1 ] of tanh by linear adjustment]Are respectively adjusted to [1,2]、[0,8]And [ - π, π]。
Critic network of maneuver decision modelsThe unmanned aerial vehicle is also divided into an input layer, a hidden layer and an output layer, wherein the input layer inputs an air combat state and 3 action values of the unmanned aerial vehicle, the hidden layer is divided into 2 layers, the 1 st layer consists of 500 LSTM neurons in forward and reverse directions, the layer is expanded according to the number of the unmanned aerial vehicles and a bidirectional circulation neural network structure to form a communication layer, the 2 nd layer consists of 150 neurons, a tanh activation function is adopted, and parameters are uniformly distributed [ -3 × 10 [ -3 [-4,3×10-4]Random initialization, output layer outputs 1Q value, tan h activation function is adopted, and parameters are uniformly distributed [ -2 × 10 [)-4,2×10-4]And (4) random initialization. The Actor and ciptic models both adopt Adam optimizers, the learning rate of the Actor network is set to be 0.001, and the learning rate of the criticc network is set to be 0.0001. The discount factor λ is 0.95 and the soft update factor k of the target network is 0.005. The random process epsilon of action value exploration employs the OU process. The size of the empirical playback space R is set to 106The size of batch is set to 512.
FIG. 5 is an air combat simulated maneuver trajectory based on learned strategies after training is complete. And simulating a maneuver trajectory based on the air war of the learned strategy. As can be seen in the figure, at an initial moment, UAVs 1 and 2 fly in opposite directions facing targets 1 and 2 respectively, according to the target assignment algorithm, UAV1 and UAV2 select target 1 and target 2, respectively, as targets of attack for maneuvering engagement, in the process of approaching to respective targets, the course and the height are adjusted to avoid possible collision in the intersection, before and after meeting with the target, the UAV1 turns to the right side, and the UAV2 turns to the left side, so that the cross shield is realized, after the two unmanned aerial vehicles turn towards the opposite direction, respective attack targets are exchanged instead of continuously turning to chase the respective initially distributed targets, tactical coordination is embodied, and it is proved that through reinforcement learning training, the unmanned aerial vehicles can learn to obtain an air combat maneuvering strategy by two-machine formation, tactical coordination between two machines is realized, advantages are obtained in air combat, and multi-machine air combat is not decomposed into a plurality of 1v1 countermeasures.
Claims (1)
1. A multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning is characterized by comprising the following steps:
step 1: establishing a multi-machine air combat environment model, and defining a state space, an action space and a reward value for each unmanned aerial vehicle to make a maneuver decision in the multi-machine cooperative air combat process;
step 1-1: in a ground coordinate system, an ox axis is taken as the true east, an oy axis is taken as the true north, and an oz axis is taken as the vertical direction; the motion model of the unmanned aerial vehicle in the ground coordinate system is shown as the formula (1):
in the ground coordinate system, the dynamic model of the unmanned aerial vehicle is shown as formula (2):
wherein (x, y, z) represents the position of the drone in the ground coordinate system, v represents the drone velocity,andrespectively representing the values of the speed v of the unmanned aerial vehicle on three coordinate axes of xyz; the flight path angle gamma represents an included angle between the speed v of the unmanned aerial vehicle and the horizontal plane o-x-y; the heading angle psi represents an included angle between a projection v' of the unmanned aerial vehicle speed v on an o-x-y plane and an oy axis, and g represents gravity acceleration; [ n ] ofx,nz,μ]Is a control variable, n, for controlling the unmanned aerial vehicle to maneuverxThe overload in the speed direction of the unmanned aerial vehicle represents the thrust and deceleration action of the unmanned aerial vehicle; n iszIndicating the overload of the pitching direction of the unmanned aerial vehicle, namely normal overload; μ is the roll angle around the drone velocity vector; through nxControlling the speed of the unmanned plane by nzAnd mu, controlling the direction of the speed vector of the unmanned aerial vehicle, and further controlling the unmanned aerial vehicle to perform maneuvering action;
step 1-2: setting missilesOnly have a tail attack capability; in the interception area of the missile by vUAnd vTRepresenting the speed of the drone and the target, respectively; d is a distance vector and represents the position relation between the unmanned aerial vehicle and the target; alpha is alphaUAnd alphaTRespectively representing an included angle between the speed vector of the unmanned aerial vehicle and the distance vector D and an included angle between the target speed vector and the distance vector D;
setting the maximum interception distance of the missile as DmAngle of view ofThe interception area of the missile is a conical area omega; the maneuvering target of the unmanned aerial vehicle in the air war is to make the target enter the interception area omega of the unmanned aerial vehicleUSimultaneously, the unmanned aerial vehicle is prevented from entering a target interception area omegaT;
According to the definition of the missile interception area, if the target is in the interception area of the missile of the own party, the fact that the own party can launch a weapon to attack the target and the own party is in advantage is shown, and the advantage value eta when the unmanned aerial vehicle intercepts the target is definedUComprises the following steps:
wherein (x)T,yT,zT) Representing position coordinates of the target; re is a positive number;
defining a dominance η obtained by a target intercepting unmanned aerial vehicleTComprises the following steps:
wherein (x)U,yU,zU) Representing position coordinates of the drone;
in the air battle, the advantage value eta obtained by the unmanned aerial vehicle based on the interception opportunityAIs defined as:
ηA=ηU-ηT (4)
defining an advantage value eta obtained based on angle parameters and distance parameters of two partiesBComprises the following steps:
the above formula shows that when the unmanned aerial vehicle tailors the target, the dominance value is etaB1 is ═ 1; when the unmanned aerial vehicle is tailed by the target, the advantage value is etaB-1; when the distance between the unmanned aerial vehicle and the target is larger than the maximum interception distance of the missile, the advantage value is attenuated according to an exponential function;
by integrating formulas (4) and (5), the situation assessment function eta of the air war in which the unmanned aerial vehicle is located is obtained as follows:
η=ηA+ηB (6)
step 1-3: the geometric relationship of the air combat situation at any moment is completely determined by information contained in an unmanned aerial vehicle position vector, an unmanned aerial vehicle speed vector, a target position vector and a target speed vector in the same coordinate system, so that the description of the air combat situation is composed of the following 5 aspects:
1) speed information of unmanned aerial vehicle, including speed magnitude vUTrack angle gammaUAnd heading angle psiU;
2) Speed information of the object, including the magnitude v of the speedTTrack angle gammaTAnd heading angle psiT;
3) The relative position relation between the unmanned aerial vehicle and the target is represented by a distance vector D; distance vector modulo D | | | D | |, γDRepresenting the angle of the distance vector D with the horizontal plane o-x-y,. phiDThe included angle between the projection vector of the distance vector D on the horizontal plane o-x-y and the oy axis is shown, and the relative position relation between the unmanned aerial vehicle and the target is D and gammaDAnd psiDRepresents;
4) the relative motion relation between the unmanned aerial vehicle and the target comprises an included angle alpha between a speed vector of the unmanned aerial vehicle and a distance vector DUAnd the angle alpha between the target velocity vector and the distance vector DT;
5) Height information z of unmanned aerial vehicleUAnd height information z of the targetT;
Based on the variables 1) to 5) above, the 1v1 air battle situation at any time can be completely characterized, so the state space of the 1v1 maneuver decision model is a 13-dimensional vector space s:
s=[vU,γU,ψU,vT,γT,ψT,D,γD,ψD,αU,αT,zU,zT] (7)
adopting a situation evaluation function eta as an air war maneuver decision reward value R, and reflecting the action of an action value on the air war situation through the situation evaluation function, wherein the R is eta;
step 1-4: in the multi-airplane air battle, the number of the unmanned aerial vehicles is set to be n and respectively recorded as UAVsi(i is 1,2, …, n), the number of targets is m, and each is denoted as Targetj(j ═ 1,2, …, m), the number of set targets is not greater than the number of drones, i.e. m ≦ n;
remember any two UAVsiAnd TargetjRelative state therebetween isUAViUAV for interacting with any one of the other aircraftkThe relative state between them is recorded asAny UAV in multi-aircraft air combatiThe observation state of (a) is:
Si=[∪sij|j=1,2...,m,∪sik|k=1,2,...,n(k≠i)] (8)
in the process of multi-machine air combat, each unmanned aerial vehicle makes own maneuvering decision according to the situation of the unmanned aerial vehicle in the air combat environment, and the unmanned aerial vehicle passes through n according to the unmanned aerial vehicle dynamics model shown in the formula (2)x、nzAnd μ three variables control flight, therefore UAViHas a motion space of Ai=[nxi,nzi,μi];
In-multiple-machine cooperative air battleIn the method, the situation assessment value eta between each unmanned aerial vehicle and each target is calculated according to the formula (4) and the formula (5)AAnd ηBRecording UAViAnd TargetjHas a situation evaluation value ofAndin addition to that, consider a UAViFriend machine UAVkThe relative state of (A) on the self-situation, thus defining the UAViFriend machine UAVkThe situation assessment function of (1) is:
wherein DikUAV for unmanned aerial vehicleiFriend machine UAVkA distance between, DsafeIs the minimum safe distance between two unmanned planes, and P is a positive number.
Step 2: establishing a multi-machine cooperative target distribution method, and determining a target distribution rule during reinforcement learning training;
step 2-1: in the air battle, n unmanned aerial vehicles are arranged to fight m targets, and n is more than or equal to m; according to equation (6), UAVi(i-1, 2, …, n) vs. TargetjThe situation evaluation value of (j ═ 1,2, …, m) is
Let the target allocation matrix be X ═ Xij],xij1 denotes TargetjTo UAVi,xij0 denotes TargetjNot allocated to UAVi(ii) a Let each drone be able to launch missiles at most simultaneously on L targets located in its attack zone, i.e.At the same time, the war time is to be avoidedTargets are omitted and the attack is abandoned, i.e. each target should be assigned at least one drone to attack, soAll unmanned aerial vehicles are required to be put into combat, so that
With the situation advantage maximization of the unmanned aerial vehicle to the target as a target, establishing a target distribution model as follows:
step 2-2: in the target allocation process, targets in an attack area are allocated firstly, and then targets outside the attack area are allocated, so that the target allocation method is divided into the following two parts:
step 2-2-1: preferentially distributing targets located in the attack area;
to be provided withAndconstructing two n x m dimensional matrices H for elementsAAnd HB, From equation (3) if TargetjIn the UAViIn the attack area of (1), thenOtherwiseThus, letOrder toX of corresponding positions of all zero elementsij1 is ═ 1; during the distribution process, if at unmanned aerial vehicle UAViThe target number x in the attack area exceeds the maximum attack target number of the unmanned aerial vehicle, namely x>L, then use UAViAt HBSorting corresponding element values in the matrix, selecting L targets with the maximum element values to be allocated to the UAVi;
Step 2-2-2: allocating targets located outside the attack area;
for UAViIf a target within its attack zone has already been allocated, it can no longer be allocated a target outside the attack zone; for a plurality of targets outside the attack area, the unmanned aerial vehicle cannot maneuver so that the targets are in the attack area, and therefore when the targets are outside the attack area, only one target can be allocated to the unmanned aerial vehicle; therefore, after the target allocation in the attack area is completed, the remaining target allocation work is changed into a process of allocating 1 target to the unallocated unmanned aerial vehicle, and the allocation is realized by adopting the hungarian algorithm, which specifically comprises the following steps:
first, a matrix X is allocated according to the current target [ X ]ij]n×mIs prepared from HBAll of x inijDeleting the ith row and the jth column where the 1 is positioned to obtain a matrixBased onThe allocation result is calculated by adopting the Hungarian algorithm, because n is more than or equal to m, and L>0, adopting a margin complementing method to complete the Hungarian algorithm, realizing target distribution, and ordering corresponding xij=1;
Complete the above two stepsAfter the step (b), the distribution of all targets is completed, and a target distribution matrix X ═ X is obtainedij]n×m;
And step 3: designing a multi-machine cooperative maneuver strategy learning algorithm and determining a reinforcement learning training logic;
the multi-machine cooperative maneuver strategy learning algorithm comprises a strategy coordination mechanism and a strategy learning mechanism:
step 3-1: designing a strategy coordination mechanism;
the air combat confrontation is regarded as a competitive game between n unmanned aerial vehicles and m targets, a model is established based on a framework of a random game, and one random game can use one tupleTo represent; s represents the state space of the current game, and all agents can be shared; UAViIs defined as Ai,TargetiIs defined as Bi;T:S×An×Bm→ S denotes the deterministic transfer function of the environment, representing a UAViA reward value function of; the action spaces of drones in respective convoy in the collaborative air battle are the same, i.e. for UAViAnd TargetjRespectively have AiA and Bi=B;
Defining the global reward value of the formation of the unmanned aerial vehicles as the average value of the reward values of the unmanned aerial vehicles, namely:
wherein r (s, a, b) represents that at the time t, the environment state is s, and the unmanned aerial vehicles form to take action a belongs to AnThe target formation takes action B ∈ BmIn case of (2), unmanned plane knittingThe value of the reward earned by the team;
the goal of drone formation is to learn a strategy to make the expectation of a discount accumulation of reward valuesMaximum, wherein 0<λ ≦ 1 is a discount factor; the random game is transformed into a markov decision problem:
wherein Q*(. cndot.) represents a state-action value function for executing action a in state s, r (s, a) represents a reward value for executing action a in state s, θ represents a network parameter of the policy function, s' represents a state at a next time, aθRepresenting a parameterized policy function;
the reward value function for each drone is defined as:
wherein r isi(s, a, b) represents that at the time t, the environment state is s, and the unmanned aerial vehicle formation takes action as a belongs to AnThe target formation takes action B ∈ BmIn case of (A), UAViThe value of the prize earned, whereinCharacterizing UAVsiRelative to the situational dominance value of the target assigned to it,is a penalty term to constrain the UAViDistance from friend machine;
based on the formula (13), for n unmanned aerial vehicle individuals, there are n bellman equations shown as the formula (14), wherein the policy function aθWith the same parameters θ:
wherein the content of the first and second substances,UAV representing an unmanned aerial vehicleiState-action value function for performing action a in state s, ri(s, a) denotes unmanned aerial vehicle UAViA reward value obtained by performing action a in state s;
step 3-2: designing a strategy learning mechanism;
establishing a multi-unmanned aerial vehicle maneuvering decision model by adopting a bidirectional circulation neural network BRNN;
the multi-unmanned aerial vehicle air combat maneuver decision model consists of an Actor network and a Critic network, wherein the Actor network is formed by connecting Actor networks of all unmanned aerial vehicle individuals through BRNN, and the Critic network is formed by connecting Critic networks of all unmanned aerial vehicle individuals through BRNN; setting hidden layers in strategy networks Actor and Q networks Critic in a single unmanned aerial vehicle decision model into a BRNN circulating unit in a multi-unmanned aerial vehicle air combat maneuver decision model, and then expanding the BRNN according to the number of unmanned aerial vehicles; the input of the multi-unmanned aerial vehicle air combat maneuver decision model is the current air combat situation, and the action value of each unmanned aerial vehicle is output;
defining UAVsiHas an objective function ofRepresenting individual prize values riThe expectation of the accumulation of (a) is,indicating the adoption of an action policy a under a state transition function TθThe obtained state distribution is stable in the traversal markov decision process, so that the target functions of n unmanned planes are recorded as J (theta):
according to the multi-agent deterministic policy gradient theorem, for the target function J (theta) of the n drones described in equation (15), the gradient of the policy network parameter theta is
Using parameterized Critic function Qξ(s, a) to estimate the state-action function in equation (16)When the Critic is trained, a square sum loss function is adopted to calculate a parameterized Critic function QξThe gradient of (s, a) is shown as equation (17), where ξ is a parameter of the Q network:
optimizing the Actor and Critic networks by adopting a random gradient descent method based on the formulas (16) and (17); in the interactive learning process, the learning optimization of the collaborative air combat strategy is completed through the data updating parameters obtained by trial and error;
step 3-3: according to the strategy coordination mechanism and the strategy learning mechanism, the reinforcement learning training process for determining the multi-unmanned aerial vehicle collaborative air combat maneuver decision model is as follows:
step 3-3-1: firstly, initialization is carried out: determining the forces and situations of both air combat parties, and arranging n unmanned aerial vehicles and m targets for air combat confrontation, wherein n is more than or equal to m; randomly initializing an online network parameter theta of the Actor and a parameter xi of the online network of Critic, and then respectively assigning the parameters of the Actor and the Critic online networks to the parameters of corresponding target networks, namely theta '← theta, xi' ← xi, theta 'and xi' are the parameters of the Actor and the Critic target networks respectively; initializing an experience pool R1The system is used for storing experience data obtained by probe interaction; initialize a randomThe machine process epsilon is used for realizing the exploration of action values;
step 3-3-2: determining an initial state of training, namely determining the relative situation of two parties at the beginning of an air battle; setting initial position information and speed information of each unmanned aerial vehicle in the unmanned aerial vehicle formation and target formation, namely determining (x, y, z, v, gamma, psi) information of each unmanned aerial vehicle, and calculating to obtain an air war initial state s according to the definition of a state space1(ii) a Let t equal 1;
step 3-3-3: and (3) repeatedly carrying out multi-screen training according to the initial state, and executing the following operations in each single-screen air combat simulation:
firstly according to the current air war state stCalculating a target distribution matrix X based on the target distribution methodt(ii) a Then each UAViAccording to state stAnd a random process ∈ generating action valuesAnd executing, at the same time, each Target in the Target formationiPerforming an actionState transition to s after executiont+1Calculating the value of the prize award according to equation (13)Transfer a process variableStored as a piece of experience data in an experience pool R1Performing the following steps; during learning, from experience pool R1Randomly sampling a batch of M pieces of empirical dataCalculating the target Q value of each unmanned aerial vehicle, namely for each piece of M data, the following steps are carried out:
the gradient estimate for Critic was calculated according to equation (17) as follows:
the gradient estimation value of Actor is calculated according to equation (16) as follows:
updating online network parameters of Actor and Critic by adopting an optimizer according to the obtained gradient estimation values delta xi and delta theta; after the online network optimization is completed, the target network parameters are updated in a soft update mode, namely
Wherein κ ∈ (0, 1);
step 3-3-4: and after the single-screen simulation is finished, if the simulation reaches the set maximum screen number, stopping the reinforcement learning training, otherwise adding 1 to t, and repeatedly executing the step 3-3-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110318644.5A CN112947581B (en) | 2021-03-25 | 2021-03-25 | Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110318644.5A CN112947581B (en) | 2021-03-25 | 2021-03-25 | Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112947581A true CN112947581A (en) | 2021-06-11 |
CN112947581B CN112947581B (en) | 2022-07-05 |
Family
ID=76226772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110318644.5A Active CN112947581B (en) | 2021-03-25 | 2021-03-25 | Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112947581B (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255234A (en) * | 2021-06-28 | 2021-08-13 | 北京航空航天大学 | Method for carrying out online target distribution on missile groups |
CN113566831A (en) * | 2021-09-26 | 2021-10-29 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster navigation method, device and equipment based on human-computer interaction |
CN113625739A (en) * | 2021-08-25 | 2021-11-09 | 中国航空工业集团公司沈阳飞机设计研究所 | Expert system optimization method based on heuristic maneuver selection algorithm |
CN113791634A (en) * | 2021-08-22 | 2021-12-14 | 西北工业大学 | Multi-aircraft air combat decision method based on multi-agent reinforcement learning |
CN113805569A (en) * | 2021-09-23 | 2021-12-17 | 北京理工大学 | Multi-agent technology-based countermeasure system, method, terminal and storage medium |
CN113867178A (en) * | 2021-10-26 | 2021-12-31 | 哈尔滨工业大学 | Virtual and real migration training system for multi-robot confrontation |
CN113893539A (en) * | 2021-12-09 | 2022-01-07 | 中国电子科技集团公司第十五研究所 | Cooperative fighting method and device for intelligent agent |
CN113962012A (en) * | 2021-07-23 | 2022-01-21 | 中国科学院自动化研究所 | Unmanned aerial vehicle countermeasure strategy optimization method and device |
CN114167899A (en) * | 2021-12-27 | 2022-03-11 | 北京联合大学 | Unmanned aerial vehicle swarm cooperative countermeasure decision-making method and system |
CN114167756A (en) * | 2021-12-08 | 2022-03-11 | 北京航空航天大学 | Autonomous learning and semi-physical simulation verification method for cooperative air combat decision of multiple unmanned aerial vehicles |
CN114239392A (en) * | 2021-12-09 | 2022-03-25 | 南通大学 | Unmanned aerial vehicle decision model training method, using method, equipment and medium |
CN114326826A (en) * | 2022-01-11 | 2022-04-12 | 北方工业大学 | Multi-unmanned aerial vehicle formation transformation method and system |
CN114330115A (en) * | 2021-10-27 | 2022-04-12 | 中国空气动力研究与发展中心计算空气动力研究所 | Neural network air combat maneuver decision method based on particle swarm search |
CN114727407A (en) * | 2022-05-12 | 2022-07-08 | 中国科学院自动化研究所 | Resource allocation method, device and equipment |
CN114815882A (en) * | 2022-04-08 | 2022-07-29 | 北京航空航天大学 | Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning |
CN115097864A (en) * | 2022-06-27 | 2022-09-23 | 中国人民解放军海军航空大学 | Multi-machine formation task allocation method |
CN115113642A (en) * | 2022-06-02 | 2022-09-27 | 中国航空工业集团公司沈阳飞机设计研究所 | Multi-unmanned aerial vehicle space-time key feature self-learning cooperative confrontation decision-making method |
CN115238832A (en) * | 2022-09-22 | 2022-10-25 | 中国人民解放军空军预警学院 | CNN-LSTM-based air formation target intention identification method and system |
CN115268481A (en) * | 2022-07-06 | 2022-11-01 | 中国航空工业集团公司沈阳飞机设计研究所 | Unmanned aerial vehicle countermeasure strategy decision method and system |
CN115470894A (en) * | 2022-10-31 | 2022-12-13 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle knowledge model time-sharing calling method and device based on reinforcement learning |
CN115755956A (en) * | 2022-11-03 | 2023-03-07 | 南京航空航天大学 | Unmanned aerial vehicle maneuver decision method and system driven by knowledge and data in cooperation |
CN115826627A (en) * | 2023-02-21 | 2023-03-21 | 白杨时代(北京)科技有限公司 | Method, system, equipment and storage medium for determining formation instruction |
CN116047984A (en) * | 2023-03-07 | 2023-05-02 | 北京全路通信信号研究设计院集团有限公司 | Consistency tracking control method, device, equipment and medium of multi-agent system |
CN116149348A (en) * | 2023-04-17 | 2023-05-23 | 四川汉科计算机信息技术有限公司 | Air combat maneuver system, control method and defense system control method |
CN116227361A (en) * | 2023-03-06 | 2023-06-06 | 中国人民解放军32370部队 | Intelligent body decision method and device |
CN116489193A (en) * | 2023-05-04 | 2023-07-25 | 中国人民解放军陆军工程大学 | Combat network self-adaptive combination method, device, equipment and medium |
CN116679742A (en) * | 2023-04-11 | 2023-09-01 | 中国人民解放军海军航空大学 | Multi-six-degree-of-freedom aircraft collaborative combat decision-making method |
CN116736883A (en) * | 2023-05-23 | 2023-09-12 | 天津大学 | Unmanned aerial vehicle cluster intelligent cooperative motion planning method |
CN116893690A (en) * | 2023-07-25 | 2023-10-17 | 西安爱生技术集团有限公司 | Unmanned aerial vehicle evasion attack input data calculation method based on reinforcement learning |
CN116974297A (en) * | 2023-06-27 | 2023-10-31 | 北京五木恒润科技有限公司 | Conflict resolution method and device based on multi-objective optimization, medium and electronic equipment |
CN117111640A (en) * | 2023-10-24 | 2023-11-24 | 中国人民解放军国防科技大学 | Multi-machine obstacle avoidance strategy learning method and device based on risk attitude self-adjustment |
CN117168468A (en) * | 2023-11-03 | 2023-12-05 | 安徽大学 | Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization |
CN117162102A (en) * | 2023-10-30 | 2023-12-05 | 南京邮电大学 | Independent near-end strategy optimization training acceleration method for robot joint action |
CN117313561A (en) * | 2023-11-30 | 2023-12-29 | 中国科学院自动化研究所 | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method |
CN113962012B (en) * | 2021-07-23 | 2024-05-24 | 中国科学院自动化研究所 | Unmanned aerial vehicle countermeasure strategy optimization method and device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007080584A2 (en) * | 2006-01-11 | 2007-07-19 | Carmel-Haifa University Economic Corp. Ltd. | Uav decision and control system |
CN108319286A (en) * | 2018-03-12 | 2018-07-24 | 西北工业大学 | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning |
CN111260031A (en) * | 2020-01-14 | 2020-06-09 | 西北工业大学 | Unmanned aerial vehicle cluster target defense method based on deep reinforcement learning |
CN111523177A (en) * | 2020-04-17 | 2020-08-11 | 西安科为实业发展有限责任公司 | Air combat countermeasure autonomous decision method and system based on intelligent learning |
CN111880563A (en) * | 2020-07-17 | 2020-11-03 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
CN111880565A (en) * | 2020-07-22 | 2020-11-03 | 电子科技大学 | Q-Learning-based cluster cooperative countermeasure method |
CN112052456A (en) * | 2020-08-31 | 2020-12-08 | 浙江工业大学 | Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents |
CN112051863A (en) * | 2020-09-25 | 2020-12-08 | 南京大学 | Unmanned aerial vehicle autonomous anti-reconnaissance and enemy attack avoidance method |
CN112180967A (en) * | 2020-04-26 | 2021-01-05 | 北京理工大学 | Multi-unmanned aerial vehicle cooperative countermeasure decision-making method based on evaluation-execution architecture |
CN112182977A (en) * | 2020-10-12 | 2021-01-05 | 中国人民解放军国防科技大学 | Control method and system for cooperative game confrontation of unmanned cluster |
-
2021
- 2021-03-25 CN CN202110318644.5A patent/CN112947581B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007080584A2 (en) * | 2006-01-11 | 2007-07-19 | Carmel-Haifa University Economic Corp. Ltd. | Uav decision and control system |
CN108319286A (en) * | 2018-03-12 | 2018-07-24 | 西北工业大学 | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning |
CN111260031A (en) * | 2020-01-14 | 2020-06-09 | 西北工业大学 | Unmanned aerial vehicle cluster target defense method based on deep reinforcement learning |
CN111523177A (en) * | 2020-04-17 | 2020-08-11 | 西安科为实业发展有限责任公司 | Air combat countermeasure autonomous decision method and system based on intelligent learning |
CN112180967A (en) * | 2020-04-26 | 2021-01-05 | 北京理工大学 | Multi-unmanned aerial vehicle cooperative countermeasure decision-making method based on evaluation-execution architecture |
CN111880563A (en) * | 2020-07-17 | 2020-11-03 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
CN111880565A (en) * | 2020-07-22 | 2020-11-03 | 电子科技大学 | Q-Learning-based cluster cooperative countermeasure method |
CN112052456A (en) * | 2020-08-31 | 2020-12-08 | 浙江工业大学 | Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents |
CN112051863A (en) * | 2020-09-25 | 2020-12-08 | 南京大学 | Unmanned aerial vehicle autonomous anti-reconnaissance and enemy attack avoidance method |
CN112182977A (en) * | 2020-10-12 | 2021-01-05 | 中国人民解放军国防科技大学 | Control method and system for cooperative game confrontation of unmanned cluster |
Non-Patent Citations (4)
Title |
---|
WEIREN KONG,等: "Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction", 《MDPI》 * |
丁林静,等: "基于强化学习的无人机空战机动决策", 《航空电子技术》 * |
刘强,等: "基于深度强化学习的群体对抗策略研究", 《智能计算机与应用》 * |
谢建峰,等: "基于强化遗传算法的无人机空战机动决策研究", 《西北工业大学学报》 * |
Cited By (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255234A (en) * | 2021-06-28 | 2021-08-13 | 北京航空航天大学 | Method for carrying out online target distribution on missile groups |
CN113962012B (en) * | 2021-07-23 | 2024-05-24 | 中国科学院自动化研究所 | Unmanned aerial vehicle countermeasure strategy optimization method and device |
CN113962012A (en) * | 2021-07-23 | 2022-01-21 | 中国科学院自动化研究所 | Unmanned aerial vehicle countermeasure strategy optimization method and device |
CN113791634B (en) * | 2021-08-22 | 2024-02-02 | 西北工业大学 | Multi-agent reinforcement learning-based multi-machine air combat decision method |
CN113791634A (en) * | 2021-08-22 | 2021-12-14 | 西北工业大学 | Multi-aircraft air combat decision method based on multi-agent reinforcement learning |
CN113625739A (en) * | 2021-08-25 | 2021-11-09 | 中国航空工业集团公司沈阳飞机设计研究所 | Expert system optimization method based on heuristic maneuver selection algorithm |
CN113805569A (en) * | 2021-09-23 | 2021-12-17 | 北京理工大学 | Multi-agent technology-based countermeasure system, method, terminal and storage medium |
CN113805569B (en) * | 2021-09-23 | 2024-03-26 | 北京理工大学 | Countermeasure system, method, terminal and storage medium based on multi-agent technology |
CN113566831A (en) * | 2021-09-26 | 2021-10-29 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster navigation method, device and equipment based on human-computer interaction |
CN113867178A (en) * | 2021-10-26 | 2021-12-31 | 哈尔滨工业大学 | Virtual and real migration training system for multi-robot confrontation |
CN113867178B (en) * | 2021-10-26 | 2022-05-31 | 哈尔滨工业大学 | Virtual and real migration training system for multi-robot confrontation |
CN114330115A (en) * | 2021-10-27 | 2022-04-12 | 中国空气动力研究与发展中心计算空气动力研究所 | Neural network air combat maneuver decision method based on particle swarm search |
CN114167756A (en) * | 2021-12-08 | 2022-03-11 | 北京航空航天大学 | Autonomous learning and semi-physical simulation verification method for cooperative air combat decision of multiple unmanned aerial vehicles |
CN114167756B (en) * | 2021-12-08 | 2023-06-02 | 北京航空航天大学 | Multi-unmanned aerial vehicle collaborative air combat decision autonomous learning and semi-physical simulation verification method |
CN114239392A (en) * | 2021-12-09 | 2022-03-25 | 南通大学 | Unmanned aerial vehicle decision model training method, using method, equipment and medium |
CN113893539B (en) * | 2021-12-09 | 2022-03-25 | 中国电子科技集团公司第十五研究所 | Cooperative fighting method and device for intelligent agent |
CN113893539A (en) * | 2021-12-09 | 2022-01-07 | 中国电子科技集团公司第十五研究所 | Cooperative fighting method and device for intelligent agent |
CN114167899A (en) * | 2021-12-27 | 2022-03-11 | 北京联合大学 | Unmanned aerial vehicle swarm cooperative countermeasure decision-making method and system |
CN114167899B (en) * | 2021-12-27 | 2023-05-26 | 北京联合大学 | Unmanned plane bee colony collaborative countermeasure decision-making method and system |
CN114326826A (en) * | 2022-01-11 | 2022-04-12 | 北方工业大学 | Multi-unmanned aerial vehicle formation transformation method and system |
CN114815882A (en) * | 2022-04-08 | 2022-07-29 | 北京航空航天大学 | Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning |
CN114727407B (en) * | 2022-05-12 | 2022-08-26 | 中国科学院自动化研究所 | Resource allocation method, device and equipment |
CN114727407A (en) * | 2022-05-12 | 2022-07-08 | 中国科学院自动化研究所 | Resource allocation method, device and equipment |
CN115113642A (en) * | 2022-06-02 | 2022-09-27 | 中国航空工业集团公司沈阳飞机设计研究所 | Multi-unmanned aerial vehicle space-time key feature self-learning cooperative confrontation decision-making method |
CN115097864A (en) * | 2022-06-27 | 2022-09-23 | 中国人民解放军海军航空大学 | Multi-machine formation task allocation method |
CN115268481A (en) * | 2022-07-06 | 2022-11-01 | 中国航空工业集团公司沈阳飞机设计研究所 | Unmanned aerial vehicle countermeasure strategy decision method and system |
CN115238832B (en) * | 2022-09-22 | 2022-12-02 | 中国人民解放军空军预警学院 | CNN-LSTM-based air formation target intention identification method and system |
CN115238832A (en) * | 2022-09-22 | 2022-10-25 | 中国人民解放军空军预警学院 | CNN-LSTM-based air formation target intention identification method and system |
CN115470894A (en) * | 2022-10-31 | 2022-12-13 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle knowledge model time-sharing calling method and device based on reinforcement learning |
CN115755956A (en) * | 2022-11-03 | 2023-03-07 | 南京航空航天大学 | Unmanned aerial vehicle maneuver decision method and system driven by knowledge and data in cooperation |
CN115755956B (en) * | 2022-11-03 | 2023-12-15 | 南京航空航天大学 | Knowledge and data collaborative driving unmanned aerial vehicle maneuvering decision method and system |
CN115826627A (en) * | 2023-02-21 | 2023-03-21 | 白杨时代(北京)科技有限公司 | Method, system, equipment and storage medium for determining formation instruction |
CN116227361A (en) * | 2023-03-06 | 2023-06-06 | 中国人民解放军32370部队 | Intelligent body decision method and device |
CN116227361B (en) * | 2023-03-06 | 2023-08-15 | 中国人民解放军32370部队 | Intelligent body decision method and device |
CN116047984A (en) * | 2023-03-07 | 2023-05-02 | 北京全路通信信号研究设计院集团有限公司 | Consistency tracking control method, device, equipment and medium of multi-agent system |
CN116679742A (en) * | 2023-04-11 | 2023-09-01 | 中国人民解放军海军航空大学 | Multi-six-degree-of-freedom aircraft collaborative combat decision-making method |
CN116679742B (en) * | 2023-04-11 | 2024-04-02 | 中国人民解放军海军航空大学 | Multi-six-degree-of-freedom aircraft collaborative combat decision-making method |
CN116149348B (en) * | 2023-04-17 | 2023-06-23 | 四川汉科计算机信息技术有限公司 | Air combat maneuver system, control method and defense system control method |
CN116149348A (en) * | 2023-04-17 | 2023-05-23 | 四川汉科计算机信息技术有限公司 | Air combat maneuver system, control method and defense system control method |
CN116489193B (en) * | 2023-05-04 | 2024-01-23 | 中国人民解放军陆军工程大学 | Combat network self-adaptive combination method, device, equipment and medium |
CN116489193A (en) * | 2023-05-04 | 2023-07-25 | 中国人民解放军陆军工程大学 | Combat network self-adaptive combination method, device, equipment and medium |
CN116736883A (en) * | 2023-05-23 | 2023-09-12 | 天津大学 | Unmanned aerial vehicle cluster intelligent cooperative motion planning method |
CN116736883B (en) * | 2023-05-23 | 2024-03-08 | 天津大学 | Unmanned aerial vehicle cluster intelligent cooperative motion planning method |
CN116974297B (en) * | 2023-06-27 | 2024-01-26 | 北京五木恒润科技有限公司 | Conflict resolution method and device based on multi-objective optimization, medium and electronic equipment |
CN116974297A (en) * | 2023-06-27 | 2023-10-31 | 北京五木恒润科技有限公司 | Conflict resolution method and device based on multi-objective optimization, medium and electronic equipment |
CN116893690A (en) * | 2023-07-25 | 2023-10-17 | 西安爱生技术集团有限公司 | Unmanned aerial vehicle evasion attack input data calculation method based on reinforcement learning |
CN117111640A (en) * | 2023-10-24 | 2023-11-24 | 中国人民解放军国防科技大学 | Multi-machine obstacle avoidance strategy learning method and device based on risk attitude self-adjustment |
CN117111640B (en) * | 2023-10-24 | 2024-01-16 | 中国人民解放军国防科技大学 | Multi-machine obstacle avoidance strategy learning method and device based on risk attitude self-adjustment |
CN117162102A (en) * | 2023-10-30 | 2023-12-05 | 南京邮电大学 | Independent near-end strategy optimization training acceleration method for robot joint action |
CN117168468A (en) * | 2023-11-03 | 2023-12-05 | 安徽大学 | Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization |
CN117168468B (en) * | 2023-11-03 | 2024-02-06 | 安徽大学 | Multi-unmanned-ship deep reinforcement learning collaborative navigation method based on near-end strategy optimization |
CN117313561B (en) * | 2023-11-30 | 2024-02-13 | 中国科学院自动化研究所 | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method |
CN117313561A (en) * | 2023-11-30 | 2023-12-29 | 中国科学院自动化研究所 | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method |
Also Published As
Publication number | Publication date |
---|---|
CN112947581B (en) | 2022-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112947581B (en) | Multi-unmanned aerial vehicle collaborative air combat maneuver decision method based on multi-agent reinforcement learning | |
CN111880563B (en) | Multi-unmanned aerial vehicle task decision method based on MADDPG | |
Yang et al. | Maneuver decision of UAV in short-range air combat based on deep reinforcement learning | |
WO2021174765A1 (en) | Control system based on multi-unmanned-aerial-vehicle collaborative game confrontation | |
CN108319286B (en) | Unmanned aerial vehicle air combat maneuver decision method based on reinforcement learning | |
Jiandong et al. | UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning | |
CN112180967B (en) | Multi-unmanned aerial vehicle cooperative countermeasure decision-making method based on evaluation-execution architecture | |
CN112902767B (en) | Multi-missile time collaborative missile guidance method and system | |
CN113791634A (en) | Multi-aircraft air combat decision method based on multi-agent reinforcement learning | |
CN112198892B (en) | Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method | |
CN113095481A (en) | Air combat maneuver method based on parallel self-game | |
CN114489144B (en) | Unmanned aerial vehicle autonomous maneuver decision method and device and unmanned aerial vehicle | |
CN112906233B (en) | Distributed near-end strategy optimization method based on cognitive behavior knowledge and application thereof | |
CN111859541B (en) | PMADDPG multi-unmanned aerial vehicle task decision method based on transfer learning improvement | |
CN114063644B (en) | Unmanned fighter plane air combat autonomous decision-making method based on pigeon flock reverse countermeasure learning | |
CN114460959A (en) | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game | |
CN113282061A (en) | Unmanned aerial vehicle air game countermeasure solving method based on course learning | |
CN114167756B (en) | Multi-unmanned aerial vehicle collaborative air combat decision autonomous learning and semi-physical simulation verification method | |
CN113741186B (en) | Double-aircraft air combat decision-making method based on near-end strategy optimization | |
CN116700079A (en) | Unmanned aerial vehicle countermeasure occupation maneuver control method based on AC-NFSP | |
Wu et al. | Visual range maneuver decision of unmanned combat aerial vehicle based on fuzzy reasoning | |
Duan et al. | Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization | |
CN116796843A (en) | Unmanned aerial vehicle many-to-many chase game method based on PSO-M3DDPG | |
CN116225065A (en) | Unmanned plane collaborative pursuit method of multi-degree-of-freedom model for multi-agent reinforcement learning | |
Guo et al. | Maneuver decision of UAV in air combat based on deterministic policy gradient |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |