CN114460959A - Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game - Google Patents
Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game Download PDFInfo
- Publication number
- CN114460959A CN114460959A CN202111534368.2A CN202111534368A CN114460959A CN 114460959 A CN114460959 A CN 114460959A CN 202111534368 A CN202111534368 A CN 202111534368A CN 114460959 A CN114460959 A CN 114460959A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- target
- aerial vehicle
- action
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000009471 action Effects 0.000 claims abstract description 64
- 230000006870 function Effects 0.000 claims abstract description 41
- 230000007123 defense Effects 0.000 claims abstract description 17
- 230000002787 reinforcement Effects 0.000 claims abstract description 13
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 13
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 9
- 239000002245 particle Substances 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000004083 survival effect Effects 0.000 abstract description 3
- 230000003042 antagnostic effect Effects 0.000 abstract 3
- 230000001133 acceleration Effects 0.000 description 8
- 238000012546 transfer Methods 0.000 description 7
- 230000006399 behavior Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000009194 climbing Effects 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 230000002349 favourable effect Effects 0.000 description 3
- 239000011888 foil Substances 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000009189 diving Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 208000001613 Gambling Diseases 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000009916 joint effect Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/104—Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention discloses a cooperative autonomous decision-making method and a cooperative autonomous decision-making device for an unmanned aerial vehicle group based on a multi-body game, wherein the method comprises the following steps: establishing an antagonistic model of the unmanned aerial vehicle group and the target, wherein the antagonistic model comprises a motion model of the unmanned aerial vehicle and the target, a maneuvering action library of both antagonistic parties and a maneuvering attack and defense library; two confrontation parties are used as intelligent agents, and a random game model is constructed under the condition of zero sum game of two persons; and solving the random game model by adopting deep reinforcement learning to obtain an optimal strategy. The invention can automatically select countermeasures against various interferences of enemies based on the intelligent decision-making functions of multi-body game and deep reinforcement learning so as to meet the anti-interference application requirements of the unmanned aerial vehicle cluster in the countermeasures environment. The unmanned aerial vehicle cluster can independently select corresponding anti-interference means against defense means adopted by an enemy target, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.
Description
Technical Field
The invention belongs to the technical field of aircraft control, and particularly relates to a cooperative autonomous decision-making method and device for an unmanned aerial vehicle group based on multi-body game.
Background
In order to realize the autonomous decision of the task execution means when the unmanned aerial vehicle cluster is in a complex countermeasure environment and multi-target countermeasures, the autonomous means can be selected as a task allocation problem. For example, in the cooperative anti-interference process, lateral maneuvering, electronic countermeasure, towing bait release and the like can be regarded as tasks, anti-interference tasks are distributed to unmanned aerial vehicle cluster resources, and a plurality of unmanned aerial vehicles are selected to execute different anti-interference tasks.
The traditional method constructs an autonomous decision problem as a task distribution oriented multi-traveler problem (MTSP), and a Mixed Integer Linear Programming (MILP) model is adopted to solve the autonomous decision problem model. In addition, in order to realize more reasonable task allocation, dynamic task time constraint and unmanned aerial vehicle task capability constraint can be introduced, an extended multi-machine cooperative task allocation model is established, and meanwhile, path and time optimization is carried out on MTSP, and an MTSP digital planning model is established. On the basis of the MILP model, the problem of multi-task allocation of heterogeneous multi-unmanned aerial vehicles can be added, and improved MILP modeling is achieved.
Algorithms for solving the task allocation problem based on the task model are mainly divided into an optimization method and a heuristic method. The Hungarian algorithm is the most common optimization method, and can be popularized to multi-target allocation. The heuristic method is a compromise between the time consumption and the quality of the solution, and aims to obtain a satisfactory solution of the problem within a certain time range. The colony algorithm, including ant colony algorithm, particle swarm algorithm and the like, is a heuristic method which is applied more at present, and the algorithms simulate the cluster behaviors of birds, insects and fishes in nature.
Disclosure of Invention
The invention aims to provide a cooperative autonomous decision-making method and device of an unmanned aerial vehicle cluster based on multi-body game, which are used for realizing game decision under incomplete information and meeting the dual requirements of autonomous decision-making function on robustness and instantaneity under cooperative tasks.
In order to achieve the purpose, the invention adopts the following technical scheme:
according to the 1 st aspect of the invention, the invention discloses a cooperative autonomous decision-making method for a unmanned aerial vehicle cluster based on a multi-body game, which comprises the following steps:
constructing an confrontation model of the unmanned aerial vehicle cluster and the target, wherein the confrontation model comprises a motion model of the unmanned aerial vehicle and the target, a maneuvering action library of both confrontation parties and a maneuvering attack and defense library;
taking the two confrontation parties as intelligent agents, and constructing a random game model under the condition of zero sum game of two persons;
and solving the random game model by adopting deep reinforcement learning to obtain an optimal strategy.
In some other examples, the motion models of the drone and the target are respectively expressed by a particle motion equation, and the parameters for representing the confrontation situation of the drone cluster and the target include position coordinates, speed, relative distance, azimuth angle, and target incident angle of the confrontation parties.
In some other examples, the random game model includes a state S consisting of location coordinates (x, y, z), velocity v, relative distance R, and azimuth of the opposing partiesAnd a target angle of incidence q, expressed as:
in some other examples, the random game model includes a space of motion a of the dronepComprising 11 actions, an action space A of the objectTIncluding 5 actions.
In some other examples, the random gambling model wherein the advantage award function is:
in the formula, rpiTRepresenting unmanned plane piThe advantage situation reward with respect to the target T, Δ d represents the euclidean distance between the two parties, Δ h represents the height difference between the two parties,representing unmanned plane piAzimuth angle with respect to target T, q denotes drone piThe target incident angle of.
In other examples, the current state s, the action a taken by the drone, the action o taken by the target, the corresponding reward value r, and the next state s 'reached by the executed action are stored as a quintuple { s, a, o, r, s' } in a memory, and data of a certain size is randomly extracted from the memory as training samples, and the target Q value is calculated to train the neural network.
In some other examples, the solving the random game model using deep reinforcement learning includes:
step S31: setting an initial state of both parties, initializing a memory bank, and setting an observed value;
step S32: creating a Q network and a target network, wherein the Q network parameter is theta, and the target network parameter is theta-θ; the input of the neural network is a state s, the output is an action state value function Q, and after learning for a certain number of times, the parameters of the Q network are copied to a target network;
step S33: the following loop traversal process is performed:
s331: the unmanned aerial vehicle selects an action a according to the current state s and the strategy pi and executes the action a to obtain the next state s' and the obtained reward r; observing an action o selected by a target in a state s, and storing a { s, a, o, r, s' } quintuple in a memory bank;
s332: randomly extracting partial data from a memory base to serve as a training sample, taking the s ' value of the training sample as the input of a neural network, and obtaining Q [ s ' ] under the state s ' according to the output of the neural network;
s333: obtaining a minimax state value V [ s' ] by using linear programming, and calculating a target Q value target _ Q;
s334: calculating a loss function, optimizing by adopting a gradient descent method, and updating Q network parameters;
step S34: and performing linear programming solution by using the Q value output by the trained neural network to obtain an optimal strategy pi.
According to the 2 nd aspect of the invention, the invention discloses a cooperative autonomous decision-making device for a unmanned aerial vehicle cluster based on a multi-body game, which comprises: the unmanned aerial vehicle formation cooperative guidance system comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein when the program is executed by the processor, the unmanned aerial vehicle formation cooperative guidance method according to the scheme is realized.
According to the 3 rd aspect of the present invention, a non-transitory storable medium storing a computer program which, when executed by a processor, implements the method for formation and cooperative guidance of drones according to the above aspects is disclosed.
By adopting the invention, the unmanned aerial vehicle can independently select various countermeasures for various interferences of enemies based on the intelligent decision-making functions of multi-body game and deep reinforcement learning so as to adapt to the anti-interference application requirements of the unmanned aerial vehicle cluster in the countermeasures environment. The unmanned aerial vehicle cluster can independently select corresponding anti-interference means against defense means adopted by an enemy target, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.
Drawings
Fig. 1 is a schematic flow chart of a cooperative autonomous decision-making method for a multi-body game-based unmanned aerial vehicle cluster according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a relative situation relationship between the unmanned aerial vehicle and a target;
FIG. 3 is a schematic diagram of an unmanned aerial vehicle expanding basic maneuver library;
FIG. 4 is a schematic diagram of a target defense model;
FIG. 5 is a maneuver decision flow according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart of the Minimax-DQN algorithm according to an embodiment of the present invention;
fig. 7 is a schematic composition diagram of a cooperative autonomous decision-making device for a multi-body game-based unmanned aerial vehicle fleet according to an embodiment of the present invention.
Detailed Description
Hereinafter, the cooperative autonomous decision method and apparatus for an unmanned aerial vehicle fleet according to the present invention will be described in detail with reference to the accompanying drawings and embodiments, but the present invention is not limited to these embodiments.
According to an embodiment of the invention, a cooperative autonomous decision-making method for a unmanned aerial vehicle cluster based on a multi-body game is disclosed, as shown in fig. 1, the method comprises the following steps:
step S1, constructing an confrontation model of the unmanned aerial vehicle cluster and the target;
the method specifically comprises the following steps:
step S11, constructing an unmanned aerial vehicle model;
the unmanned aerial vehicle adopts a three-degree-of-freedom particle motion equation, and the motion model is as follows:
Vy=Vsinθ
wherein theta is a track inclination angle,for track deflection angle, V is unmanned aerial vehicle speed, Vx、Vy、VzRespectively, the velocity components in each direction, and x, y and z are centroid coordinates.
Step S12, constructing a target model;
the target adopts a three-degree-of-freedom particle motion equation, and the motion model is as follows:
in the formula (I), the compound is shown in the specification,is the navigational angle, V is the target speed, Vx、VyRespectively, the velocity components in each direction, and x and y are centroid coordinates.
Step S13, acquiring relative situation parameters of the two confrontation parties;
fig. 2 is a schematic diagram of the confrontational situation between the unmanned aerial vehicle cluster and the target. In the figure, PiRepresents the ith unmanned plane, T represents the target, R represents the ith unmanned plane PiAnd the relative distance between the target T and the target,indicates the ith unmanned plane PiQ denotes the ith drone PiI is 1,2,3 …, n is the number of drones.
In the invention, the parameters for representing the confrontation situation of the unmanned aerial vehicle cluster and the target comprise position coordinates, speed, relative distance, azimuth angle and target incidence angle of the two confrontation parties.
Step S14, constructing a maneuver library of the two countermeasures;
the maneuver library refers to a selectable set of maneuver decisions for the target in the countermeasure process, in which selectable maneuvers or sequences of maneuvers are contained.
For a drone, its basic maneuvers include: 1) maximum acceleration, 2) maximum deceleration, 3) maximum overload climbing, 4) maximum overload diving, 5) maximum overload left turning, 6) maximum overload right turning, and 7) stable flight (all control quantities are unchanged). On the basis, the invention considers the maneuvering direction and expands the maneuvering action against the unmanned aerial vehicle into 11 types, as shown in figure 3.
Assuming that the unmanned aerial vehicle performs uniform acceleration movement in a limited short time, the flight path is a small segment of circular arc, and the speed of the small segment of circular arc is only influenced by the acceleration alpha in the horizontal directioniAcceleration in the vertical direction betaiAnd lateral acceleration ρiThe control and magnitude remains the same and it is desirable to implement maneuvers with maximum overload in an attempt to complete the turning maneuver at a rapid rate.
Therefore, the relationship between the heading angle, the track angle and the controlled variable can be established, and a controlled variable set table of the 11 unmanned aerial vehicle maneuvers is obtained, as shown in the following table:
serial number | Motorized mode | Control quantity |
1 | Uniform velocity front fly F | (0,0,0) |
2 | Fly before acceleration A | (αimax,0,0) |
3 | Fly before deceleration S | (-αimax,0,0) |
4 | Left turn L | (0,0,ρimax) |
5 | Right turn R | (0,0,-ρimax) |
6 | Climbing C | (0,βimax,0) |
7 | Left climbing LC | (0,βimax,ρimax) |
8 | Right climbing RC | (0,-βimax,ρimax) |
9 | Dive D | (0,-βimax,0) |
10 | Left diving LD | (0,βimax,-ρimax) |
11 | Right dive RD | (0,-βimax,-ρimax) |
Wherein alpha isimax,βimax,ρimaxRespectively, the maximum value of the acceleration in each direction.
For the target, the maneuvering actions are 5 in total, and are divided into uniform-speed advancing, accelerated advancing, decelerated advancing, left turning and right turning, and the movement of the target is only influenced by the horizontal direction alphaiAnd lateral acceleration ρiThe control and magnitude remains the same and it is desirable to implement maneuvers with maximum overload in an attempt to complete the turning maneuver at a rapid rate.
Therefore, the relationship between the course angle and the controlled variable can be established, and a controlled variable set table of the target maneuver can be obtained, as shown in the following table:
wherein alpha isimax,ρimaxRespectively, the maximum value of the acceleration in each direction.
Step S15, constructing a maneuvering attacking and defending library of both confrontation parties;
taking a ship as an example, as shown in fig. 4, the target defense means includes: passive interference, such as foil cloud, which is emitted by the target and moves outwards integrally in a straight line at a constant speed; active interference can form an echo area with a radiation angle alpha, and the attack direction can be adjusted at a speed u; and dense array defense is positioned on two sides of the target, the attack angle is theta, and the attack direction can be adjusted by the speed v.
The details are shown in the following table:
unmanned aerial vehicle's defense strategy is mainly anti-interference strategy, includes: the RCS is changed through self intelligent skin, so that enemy detection is avoided; the interference bait is released, and when the interference bait is detected and countered by an enemy, the interference bait can be actively released to cause a false target to deceive the enemy; and planning the air route, and replanning the air route.
The details are shown in the following table:
the countermeasure between the unmanned aerial vehicle group and the target is a continuous maneuvering process, and the continuous maneuvering process is discretized from the perspective of game theory, namely the P (T) unmanned aerial vehicle makes judgment to maneuver based on a maneuvering action library and a maneuvering attack and defense library at decision time t according to the situation of the current stage, the maneuvering process continues to be delta t, and the maneuvering process enters the next stage, and is shown in fig. 5.
The maneuver and attack and defense strategies are made at the decision time and continue to the next decision time, so the three-dimensional maneuver and attack and defense strategies of the unmanned aerial vehicle can be converted into:
in the formula (I), the compound is shown in the specification,for the situation information of the countermeasure node (drone) i at the current stage,situation information after performing maneuver j for the next stage countermeasure node i.
And both the confrontation parties select the maneuver mode which can form the largest combat advantage in the next stage as the maneuver decision of the current stage, and then the confrontation game is formed until the confrontation of the locked counterpart is finished.
Step S2, constructing a random game model by taking the two confrontation parties as agents and taking the two-person zero-sum game as a condition;
the random game can be represented as a tuple (n, S, A)1,…An,T,γ,R1,…Rn) Wherein the elements contained are:
(1) the number n: indicating the number of players.
(2) And a state S: the state is a description of the environment, and after an agent makes an action, the state changes and the evolution is markov.
(3) Action A: an action is a description of the behavior of an agent and is the result of a decision. The motion space may be discrete or continuous.
(4) Transfer function T: by given player's current state s and one action A per agentiControl with transition probability of 0,1]In the meantime.
(5) Discount factor γ: the discount factor is the decay to the future reward, γ ∈ [0,1 ].
(6) A return function R: indicating that the designated player takes a joint action at state s (A)1,…An) The reward taken at state s'.
Each agent in the gaming environment is defined by a set of states S and a set of actions A1,…AkDefining, state transition from current state s and one action A per agentiControl, each agent has an associated reward function that attempts to maximize the sum of its expected discount rewards. The next player status and payback in the random game depends only on the current status and the current behavior of all players. Solving the random game requires finding a strategy pi that maximizes the future discount returns for players with a discount factor y.
The invention takes an unmanned aerial vehicle and a target confrontation both as intelligent agents and models the confrontation game under the condition of zero sum game of two persons.
Firstly, determining a state space S, an action space A and a reward function R required by each agent in a random game environment, and selecting an action A for the agent to decide for the current state SiThe next state s' is reached and the feedback reward r after interaction with the environment is obtained, and then the next round of interaction is performed, thus achieving a cycle.
1) The number n: the total number of players in the two-party confrontation, namely the unmanned aerial vehicle number and the target number.
2) And a state S: according to the factors influencing the situations of the two parties, the state characteristics of the two parties can be determined, and the state characteristics mainly comprise position coordinates (x, y, z), speed v, relative distance R and azimuth angle of the two parties in confrontationAnd a target incident angle q.
The state space from which the game can be derived can be expressed as:since the state space of the confrontation is a continuous infinite space, a learning method is required to deal with these features.
3) Action A: unmanned aerial vehicle 'S optional maneuver is 11 in total, is flying F at the uniform velocity before flying A with higher speed, flying S before the deceleration, turn left L, turn right R, climb C, climb left LC, climb right RC, dive D, dive left LD, dive right RD respectively, constructs discrete action space, then unmanned aerial vehicle' S action space ApF, a, S, L, R, C, LC, RC, D, LD, RD }. The selectable maneuvering actions of the target are 5 in total, namely uniform-speed advancing F, accelerated advancing A, decelerated advancing S, left turning L and right turning R are respectively adopted, and then the action space A of the target is obtainedT={F,A,S,L,R}。
4) Transfer function T: taking the unmanned aerial vehicle as an example, the probability that the current state s of the unmanned aerial vehicle is transferred to the next state s' under the influence of the joint behavior (a, o) of the action a selected by the unmanned aerial vehicle according to the policy and the action o selected by the opponent target.
5) Discount factor γ: the discount factor is chosen in [0,1], for example around 0.9.
6) A return function R: in the random game, Q (s, a, o) is used to indicate the expected award for the opponent to take action a and the adversary to take action o in state s. And setting the attack range reaching the unmanned aerial vehicle as a favorable situation according to the attack area of the unmanned aerial vehicle. For the reward value r of the unmanned aerial vehicle, if the unmanned aerial vehicle reaches the favorable situation, the return r is equal to 1, and if the enemy target reaches the favorable situation, the r is equal to-1.
In the countermeasure process, the situation parameters of the two parties mainly include the position coordinates, the speed, the relative distance, the azimuth angle and the target incident angle of the two parties, and then the advantage reward function is as follows:
in the formula, rpiTRepresenting unmanned plane piThe advantage situation reward with respect to the target T, Δ d represents the euclidean distance between the two parties, Δ h represents the height difference between the two parties,representing unmanned plane piAzimuth angle with respect to target T, q denotes drone piThe target incident angle of.
For a multi-person random game, a return function and a transfer function are known, and a Nash equilibrium solution of the return function and the transfer function is expected to be obtained, namely a joint strategy of each agent, and a strategy of each agent is probability distribution of an action space. Since in a gaming environment, the expected payback is influenced by the opponent's strategy, the opponent's actions are generally unpredictable in a two-player confrontation game.
On the basis, the invention adopts a Minimax algorithm to select the optimal strategy of the random game. Assuming that the opponent has high decision-making capability, on the premise that the opponent selects the action which minimizes the income of the party, the party selects the action which maximizes the income of the party. The significance of the Minimax algorithm is therefore to obtain the maximum return in the worst case.
The random game value function represents the expected discount return obtained by the optimal strategy, and the state value function v(s) and the state action value function Q (s, a) are respectively:
where T (s, a, o, s ') represents the transition probability of state s reaching state s' through actions a and o.
From this, the optimum function v(s) in the random game state s can be expressed as:
according to the above formula, the optimal strategy pi and the optimal value function V in the state s can be obtained by using a linear programming constraint method.
The action state value function Q (s, a, o) for the unmanned aerial vehicle action a and the target action o in the state s is:
through the recursive equation, a converged optimal value function can be obtained through iteration, and an optimal strategy pi is obtained.
Because the two opponents of the game belong to the mixed strategy, that is, the two opponents of the game select a certain action not definitely, but have a selection probability for all actions, and the probability is the optimal strategy pi obtained by linear programming. Therefore, the invention adopts the roulette selection method to select the action, and the higher the fitness of the individual is, the higher the probability of selection is.
Step S3, solving the model by adopting deep reinforcement learning;
since the transfer function is difficult to determine in a game context, for a state transfer function T related to a conventional method for iteratively solving an MDP (Markov decision process) using a value, an asynchronous update mode Q-learning in reinforcement learning may be used instead.
Q-learning updates the current behavior value function with a time differential objective, with a reward r being updated each time state s takes action a to transition to state s':
Q(s,a)=r+γ·V(s′) (9)
since the probability of performing an update is exactly T (s, a, s'), the transfer function can be replaced. The Q-learning method is applied to the random game and can be converted into the following steps:
Qt(s,a,o)=(1-α)*Qt-1[s,a,o]+α*(r+γ·V[s′]) (10)
in the formula, α represents learning efficiency; r represents the currently earned reward; gamma represents a discount factor, which is the decay of future rewards.
Compared with the traditional Q-learning, the minimum-Q method is combined with the thought of the game theory, and the minimum value replaces the maximum value in the Q-learning to obtain the optimal strategy required under the game condition.
The states involved in the game of the two sides are continuous infinite space, so deep learning neural network processing characteristics are needed. Therefore, the Minimax-Q method can be further expanded, a deep neural network is added to approximate a value function, the learning process of reinforcement learning is trained by empirical playback, and an independent target network is arranged to process the time difference target.
The DQN approximates the linear function in Q-learning by nonlinear approximation in the form of neural network parameters, and high-dimensional nonlinear input data under the anti-game can be processed. The behavior value function of the DQN corresponds to a group of parameters, the weight of each layer of the network in the neural network is represented by theta, and the updating value function is actually used for updating the theta parameters.
Therefore, the invention stores the current state s obtained by interaction of the agent and the environment, the action a taken by the red party (unmanned aerial vehicle), the action o taken by the blue party (target), the corresponding reward value r and the next state s 'reached by the execution action as a quintuple { s, a, o, r, s' } in a memory base, randomly extracts data with a certain size from the memory base as a training sample, and calculates the target Q value to train the neural network. The manner of calculating the target Q value is the above equation (10).
As shown in fig. 6, the algorithm steps of Minimax-DQN include:
step S31: initializing, setting an initial state of both parties, initializing the memory bank, and setting an observation value.
Step S32: creating two neural networks, namely a Q network and a target network, wherein the Q network parameter is theta, and the target network parameter is theta-θ. The input of the neural network is state s, the output is action state value function Q, after learning for a certain number of times, the parameter of Q network is copiedAnd giving the target network.
Step S33: the following loop traversal process is performed:
s331: and the unmanned aerial vehicle intelligent body selects the action a according to the current state s and the strategy pi and executes the action a to obtain the next state s' and the obtained reward r. Observing the action o selected by the target intelligent agent in the state s, and storing the { s, a, o, r, s' } quintuple in a memory bank.
S332: part of the data is randomly drawn from the memory bank as a training sample. And taking the s ' value of the training sample as the input of the neural network, and obtaining Q [ s ' ] under the state s ' according to the output of the neural network.
S333: a minimax state value V [ s' ] is obtained by using linear programming according to the formula (8), and a target Q value target _ Q is calculated according to the formula (9).
S334: and calculating a loss function, optimizing by adopting a gradient descent method, and updating Q network parameters.
Step S34: and (4) carrying out linear programming solution by using the Q value output by the trained neural network according to the formula (6) to obtain an optimal strategy pi.
Therefore, the invention provides an intelligent decision solving method based on multi-body game and deep reinforcement learning for the autonomous decision problem of the unmanned aerial vehicle group under multi-target confrontation, an intelligent decision problem model based on multiple sets of games is constructed by establishing a two-party confrontation model, and continuous infinite space states involved in the two-party confrontation game are processed by adopting the deep reinforcement learning, so that a decision scheme is obtained, and the autonomous decision function of the unmanned aerial vehicle group on multi-target cooperative anti-interference means is realized.
Specifically, the cooperative anti-interference autonomous decision problem model is constructed by adopting a multi-body game, the target defense means is foil cloud launching, active interference and dense array defense, and the unmanned aerial vehicle defense means is RCS changing, interference active releasing, dragging bait releasing and route planning. And solving the cooperative anti-interference strategy selection of the unmanned aerial vehicle group to the multiple targets by adopting a Minimax-DQN solving algorithm model.
By adopting the invention, under the condition that the unmanned aerial vehicle group resists a plurality of target games, the unmanned aerial vehicle group can independently select a plurality of interference resisting means to the enemy by relying on the intelligent decision-making function based on multi-body games and deep reinforcement learning so as to adapt to the anti-interference application requirement of the unmanned aerial vehicle group in the resisting environment. Meanwhile, for defense means such as foil cloud launching, active interference and dense array defense adopted by enemy targets, the unmanned aerial vehicle cluster can independently select and change RCS, actively release interference, release dragging bait, route planning and other anti-interference means, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.
According to another aspect of the present invention, there is also disclosed a cooperative autonomous decision-making device for a drone swarm based on a multi-body game, as shown in fig. 7, including: a processor 401, a memory 402 and a program stored on the memory and executable on the processor, wherein the program, when executed by the processor, implements the drone formation cooperative guidance method according to the above scheme.
In addition, the invention also discloses a non-transitory readable storage medium, wherein the readable storage medium stores a program, and the program is executed by a processor to realize the unmanned aerial vehicle group collaborative autonomous decision-making method.
It should be understood that the processor mentioned in the embodiments of the present invention may be implemented by hardware or may be implemented by software. When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like. When implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory.
The processor may be, for example, a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be appreciated that the memory referred to in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.
Claims (9)
1. A cooperative and autonomous decision-making method for an unmanned aerial vehicle cluster based on a multi-body game is characterized by comprising the following steps:
constructing an confrontation model of the unmanned aerial vehicle cluster and the target, wherein the confrontation model comprises a motion model of the unmanned aerial vehicle and the target, a maneuvering action library of both confrontation parties and a maneuvering attack and defense library;
taking the two confrontation parties as intelligent agents, and constructing a random game model under the condition of zero sum game of two persons;
and solving the random game model by adopting deep reinforcement learning to obtain an optimal strategy.
2. The unmanned aerial vehicle formation cooperative guidance method according to claim 1, wherein the motion models of the unmanned aerial vehicle and the target are respectively expressed by a particle motion equation, and the parameters for expressing the confrontation situation of the unmanned aerial vehicle cluster and the target comprise position coordinates, speed, relative distance, azimuth angle and target incidence angle of the confrontation parties.
4. the method for formation cooperative guidance of unmanned aerial vehicles according to claim 3, wherein in the random game model, the action space A of the unmanned aerial vehiclespComprising 11 actions, an action space A of the objectTIncluding 5 actions.
5. The unmanned aerial vehicle formation cooperative guidance method according to claim 2, wherein in the random game model, a dominant reward function is as follows:
in the formula (I), the compound is shown in the specification,representing unmanned plane piA dominance situation award with respect to the target T, Δ d represents an euclidean distance between both parties, Δ h represents a height difference between both parties,representing unmanned plane piAzimuth angle with respect to target T, q denotes drone piThe target incident angle of.
6. The method of claim 1, wherein a current state s, an action a taken by the drone, an action o taken by the target, a corresponding reward value r, and a next state s 'reached by the executed action are stored in a memory as a quintuple { s, a, o, r, s' }, and data of a certain size is randomly extracted from the memory as a training sample, and a target Q value is calculated to train the neural network.
7. The unmanned aerial vehicle formation cooperative guidance method according to claim 6, wherein the solving of the random game model by adopting deep reinforcement learning comprises the following steps:
step S31: setting an initial state of both sides, initializing a memory bank, and setting an observation value;
step S32: creating a Q network and a target network, wherein the Q network parameter is theta, and the target network parameter is theta-θ; the input of the neural network is a state s, the output is an action state value function Q, and after learning for a certain number of times, the parameters of the Q network are copied to a target network;
step S33: the following loop traversal process is performed:
s331: the unmanned aerial vehicle selects an action a according to the current state s and the strategy pi and executes the action a to obtain the next state s' and the obtained reward r; observing an action o selected by a target in a state s, and storing a { s, a, o, r, s' } quintuple in a memory bank;
s332: randomly extracting partial data from a memory base to serve as a training sample, taking the s ' value of the training sample as the input of a neural network, and obtaining Q [ s ' ] under the state s ' according to the output of the neural network;
s333: obtaining a minimax state value V [ s' ] by using linear programming, and calculating a target Q value target _ Q;
s334: calculating a loss function, optimizing by adopting a gradient descent method, and updating Q network parameters;
step S34: and performing linear programming solution by using the Q value output by the trained neural network to obtain an optimal strategy pi.
8. A cooperative autonomous decision-making device for unmanned aerial vehicle fleet based on multi-body game, comprising a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute the computer program to implement the cooperative guidance method for unmanned aerial vehicle formation according to any one of claims 1 to 7.
9. A non-transitory, storable medium storing a computer program that, when executed by a processor, implements the drone formation collaborative guidance method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534368.2A CN114460959A (en) | 2021-12-15 | 2021-12-15 | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111534368.2A CN114460959A (en) | 2021-12-15 | 2021-12-15 | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114460959A true CN114460959A (en) | 2022-05-10 |
Family
ID=81405914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111534368.2A Pending CN114460959A (en) | 2021-12-15 | 2021-12-15 | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114460959A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114727407A (en) * | 2022-05-12 | 2022-07-08 | 中国科学院自动化研究所 | Resource allocation method, device and equipment |
CN114911269A (en) * | 2022-06-17 | 2022-08-16 | 电子科技大学 | Networking radar interference strategy generation method based on unmanned aerial vehicle cluster |
CN115113642A (en) * | 2022-06-02 | 2022-09-27 | 中国航空工业集团公司沈阳飞机设计研究所 | Multi-unmanned aerial vehicle space-time key feature self-learning cooperative confrontation decision-making method |
CN115877871A (en) * | 2023-03-03 | 2023-03-31 | 北京航空航天大学 | Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning |
CN116795108A (en) * | 2023-06-09 | 2023-09-22 | 西南交通大学 | Intelligent unmanned vehicle distribution method based on multi-source sensing signals |
CN117707219A (en) * | 2024-02-05 | 2024-03-15 | 西安羚控电子科技有限公司 | Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning |
CN117806364B (en) * | 2023-12-22 | 2024-05-28 | 华中科技大学 | Fight learning architecture, control method and device for aircraft path tracking controller |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112052511A (en) * | 2020-06-15 | 2020-12-08 | 成都蓉奥科技有限公司 | Air combat maneuver strategy generation technology based on deep random game |
-
2021
- 2021-12-15 CN CN202111534368.2A patent/CN114460959A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112052511A (en) * | 2020-06-15 | 2020-12-08 | 成都蓉奥科技有限公司 | Air combat maneuver strategy generation technology based on deep random game |
Non-Patent Citations (3)
Title |
---|
刘钻东: "基于目标意图预测的多无人机协同攻防智能决策", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, pages 031 - 635 * |
徐康发: "多机协同空战智能决策与评估方法", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, pages 032 - 16 * |
马文,等: "基于深度随机博弈的近距空战机动决策", 系统工程与电子技术, no. 2, pages 443 - 451 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114727407A (en) * | 2022-05-12 | 2022-07-08 | 中国科学院自动化研究所 | Resource allocation method, device and equipment |
CN114727407B (en) * | 2022-05-12 | 2022-08-26 | 中国科学院自动化研究所 | Resource allocation method, device and equipment |
CN115113642A (en) * | 2022-06-02 | 2022-09-27 | 中国航空工业集团公司沈阳飞机设计研究所 | Multi-unmanned aerial vehicle space-time key feature self-learning cooperative confrontation decision-making method |
CN114911269A (en) * | 2022-06-17 | 2022-08-16 | 电子科技大学 | Networking radar interference strategy generation method based on unmanned aerial vehicle cluster |
CN115877871A (en) * | 2023-03-03 | 2023-03-31 | 北京航空航天大学 | Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning |
CN116795108A (en) * | 2023-06-09 | 2023-09-22 | 西南交通大学 | Intelligent unmanned vehicle distribution method based on multi-source sensing signals |
CN116795108B (en) * | 2023-06-09 | 2023-12-01 | 西南交通大学 | Intelligent unmanned vehicle distribution method based on multi-source sensing signals |
CN117806364B (en) * | 2023-12-22 | 2024-05-28 | 华中科技大学 | Fight learning architecture, control method and device for aircraft path tracking controller |
CN117707219A (en) * | 2024-02-05 | 2024-03-15 | 西安羚控电子科技有限公司 | Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning |
CN117707219B (en) * | 2024-02-05 | 2024-05-17 | 西安羚控电子科技有限公司 | Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114460959A (en) | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game | |
CN113589842B (en) | Unmanned cluster task cooperation method based on multi-agent reinforcement learning | |
CN113791634B (en) | Multi-agent reinforcement learning-based multi-machine air combat decision method | |
CN112783209B (en) | Unmanned aerial vehicle cluster confrontation control method based on pigeon intelligent competition learning | |
CN115291625A (en) | Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning | |
Wang et al. | Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm | |
CN111240353A (en) | Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree | |
CN113741525B (en) | Policy set-based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method | |
CN112198892B (en) | Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method | |
CN111859541A (en) | PMADDPG multi-unmanned aerial vehicle task decision method based on transfer learning improvement | |
Wang et al. | UAV swarm confrontation using hierarchical multiagent reinforcement learning | |
CN112651486A (en) | Method for improving convergence rate of MADDPG algorithm and application thereof | |
CN114721424A (en) | Multi-unmanned aerial vehicle cooperative countermeasure method, system and storage medium | |
CN113222106A (en) | Intelligent military chess deduction method based on distributed reinforcement learning | |
CN114510078A (en) | Unmanned aerial vehicle maneuver evasion decision-making method based on deep reinforcement learning | |
CN114638339A (en) | Intelligent agent task allocation method based on deep reinforcement learning | |
CN116700079A (en) | Unmanned aerial vehicle countermeasure occupation maneuver control method based on AC-NFSP | |
Liu et al. | Using CIGAR for finding effective group behaviors in RTS game | |
CN116225049A (en) | Multi-unmanned plane wolf-crowd collaborative combat attack and defense decision algorithm | |
CN113741186B (en) | Double-aircraft air combat decision-making method based on near-end strategy optimization | |
Hagelbäck | Multi-agent potential field based architectures for real-time strategy game bots | |
CN114167899B (en) | Unmanned plane bee colony collaborative countermeasure decision-making method and system | |
CN115859778A (en) | Air combat maneuver decision method based on DCL-GWOO algorithm | |
CN115061495A (en) | Unmanned aerial vehicle group confrontation autonomous control method based on eagle pigeon game | |
Liu et al. | Multiagent reinforcement learning with regret matching for robot soccer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |