CN114460959A - Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game - Google Patents

Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game Download PDF

Info

Publication number
CN114460959A
CN114460959A CN202111534368.2A CN202111534368A CN114460959A CN 114460959 A CN114460959 A CN 114460959A CN 202111534368 A CN202111534368 A CN 202111534368A CN 114460959 A CN114460959 A CN 114460959A
Authority
CN
China
Prior art keywords
unmanned aerial
target
aerial vehicle
action
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111534368.2A
Other languages
Chinese (zh)
Inventor
程进
邹晓滢
郝明瑞
魏东辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Electromechanical Engineering Research Institute
Original Assignee
Beijing Electromechanical Engineering Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Electromechanical Engineering Research Institute filed Critical Beijing Electromechanical Engineering Research Institute
Priority to CN202111534368.2A priority Critical patent/CN114460959A/en
Publication of CN114460959A publication Critical patent/CN114460959A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/104Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a cooperative autonomous decision-making method and a cooperative autonomous decision-making device for an unmanned aerial vehicle group based on a multi-body game, wherein the method comprises the following steps: establishing an antagonistic model of the unmanned aerial vehicle group and the target, wherein the antagonistic model comprises a motion model of the unmanned aerial vehicle and the target, a maneuvering action library of both antagonistic parties and a maneuvering attack and defense library; two confrontation parties are used as intelligent agents, and a random game model is constructed under the condition of zero sum game of two persons; and solving the random game model by adopting deep reinforcement learning to obtain an optimal strategy. The invention can automatically select countermeasures against various interferences of enemies based on the intelligent decision-making functions of multi-body game and deep reinforcement learning so as to meet the anti-interference application requirements of the unmanned aerial vehicle cluster in the countermeasures environment. The unmanned aerial vehicle cluster can independently select corresponding anti-interference means against defense means adopted by an enemy target, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.

Description

Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game
Technical Field
The invention belongs to the technical field of aircraft control, and particularly relates to a cooperative autonomous decision-making method and device for an unmanned aerial vehicle group based on multi-body game.
Background
In order to realize the autonomous decision of the task execution means when the unmanned aerial vehicle cluster is in a complex countermeasure environment and multi-target countermeasures, the autonomous means can be selected as a task allocation problem. For example, in the cooperative anti-interference process, lateral maneuvering, electronic countermeasure, towing bait release and the like can be regarded as tasks, anti-interference tasks are distributed to unmanned aerial vehicle cluster resources, and a plurality of unmanned aerial vehicles are selected to execute different anti-interference tasks.
The traditional method constructs an autonomous decision problem as a task distribution oriented multi-traveler problem (MTSP), and a Mixed Integer Linear Programming (MILP) model is adopted to solve the autonomous decision problem model. In addition, in order to realize more reasonable task allocation, dynamic task time constraint and unmanned aerial vehicle task capability constraint can be introduced, an extended multi-machine cooperative task allocation model is established, and meanwhile, path and time optimization is carried out on MTSP, and an MTSP digital planning model is established. On the basis of the MILP model, the problem of multi-task allocation of heterogeneous multi-unmanned aerial vehicles can be added, and improved MILP modeling is achieved.
Algorithms for solving the task allocation problem based on the task model are mainly divided into an optimization method and a heuristic method. The Hungarian algorithm is the most common optimization method, and can be popularized to multi-target allocation. The heuristic method is a compromise between the time consumption and the quality of the solution, and aims to obtain a satisfactory solution of the problem within a certain time range. The colony algorithm, including ant colony algorithm, particle swarm algorithm and the like, is a heuristic method which is applied more at present, and the algorithms simulate the cluster behaviors of birds, insects and fishes in nature.
Disclosure of Invention
The invention aims to provide a cooperative autonomous decision-making method and device of an unmanned aerial vehicle cluster based on multi-body game, which are used for realizing game decision under incomplete information and meeting the dual requirements of autonomous decision-making function on robustness and instantaneity under cooperative tasks.
In order to achieve the purpose, the invention adopts the following technical scheme:
according to the 1 st aspect of the invention, the invention discloses a cooperative autonomous decision-making method for a unmanned aerial vehicle cluster based on a multi-body game, which comprises the following steps:
constructing an confrontation model of the unmanned aerial vehicle cluster and the target, wherein the confrontation model comprises a motion model of the unmanned aerial vehicle and the target, a maneuvering action library of both confrontation parties and a maneuvering attack and defense library;
taking the two confrontation parties as intelligent agents, and constructing a random game model under the condition of zero sum game of two persons;
and solving the random game model by adopting deep reinforcement learning to obtain an optimal strategy.
In some other examples, the motion models of the drone and the target are respectively expressed by a particle motion equation, and the parameters for representing the confrontation situation of the drone cluster and the target include position coordinates, speed, relative distance, azimuth angle, and target incident angle of the confrontation parties.
In some other examples, the random game model includes a state S consisting of location coordinates (x, y, z), velocity v, relative distance R, and azimuth of the opposing parties
Figure BDA0003412640210000023
And a target angle of incidence q, expressed as:
Figure BDA0003412640210000021
in some other examples, the random game model includes a space of motion a of the dronepComprising 11 actions, an action space A of the objectTIncluding 5 actions.
In some other examples, the random gambling model wherein the advantage award function is:
Figure BDA0003412640210000022
in the formula, rpiTRepresenting unmanned plane piThe advantage situation reward with respect to the target T, Δ d represents the euclidean distance between the two parties, Δ h represents the height difference between the two parties,
Figure BDA0003412640210000031
representing unmanned plane piAzimuth angle with respect to target T, q denotes drone piThe target incident angle of.
In other examples, the current state s, the action a taken by the drone, the action o taken by the target, the corresponding reward value r, and the next state s 'reached by the executed action are stored as a quintuple { s, a, o, r, s' } in a memory, and data of a certain size is randomly extracted from the memory as training samples, and the target Q value is calculated to train the neural network.
In some other examples, the solving the random game model using deep reinforcement learning includes:
step S31: setting an initial state of both parties, initializing a memory bank, and setting an observed value;
step S32: creating a Q network and a target network, wherein the Q network parameter is theta, and the target network parameter is theta-θ; the input of the neural network is a state s, the output is an action state value function Q, and after learning for a certain number of times, the parameters of the Q network are copied to a target network;
step S33: the following loop traversal process is performed:
s331: the unmanned aerial vehicle selects an action a according to the current state s and the strategy pi and executes the action a to obtain the next state s' and the obtained reward r; observing an action o selected by a target in a state s, and storing a { s, a, o, r, s' } quintuple in a memory bank;
s332: randomly extracting partial data from a memory base to serve as a training sample, taking the s ' value of the training sample as the input of a neural network, and obtaining Q [ s ' ] under the state s ' according to the output of the neural network;
s333: obtaining a minimax state value V [ s' ] by using linear programming, and calculating a target Q value target _ Q;
s334: calculating a loss function, optimizing by adopting a gradient descent method, and updating Q network parameters;
step S34: and performing linear programming solution by using the Q value output by the trained neural network to obtain an optimal strategy pi.
According to the 2 nd aspect of the invention, the invention discloses a cooperative autonomous decision-making device for a unmanned aerial vehicle cluster based on a multi-body game, which comprises: the unmanned aerial vehicle formation cooperative guidance system comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein when the program is executed by the processor, the unmanned aerial vehicle formation cooperative guidance method according to the scheme is realized.
According to the 3 rd aspect of the present invention, a non-transitory storable medium storing a computer program which, when executed by a processor, implements the method for formation and cooperative guidance of drones according to the above aspects is disclosed.
By adopting the invention, the unmanned aerial vehicle can independently select various countermeasures for various interferences of enemies based on the intelligent decision-making functions of multi-body game and deep reinforcement learning so as to adapt to the anti-interference application requirements of the unmanned aerial vehicle cluster in the countermeasures environment. The unmanned aerial vehicle cluster can independently select corresponding anti-interference means against defense means adopted by an enemy target, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.
Drawings
Fig. 1 is a schematic flow chart of a cooperative autonomous decision-making method for a multi-body game-based unmanned aerial vehicle cluster according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a relative situation relationship between the unmanned aerial vehicle and a target;
FIG. 3 is a schematic diagram of an unmanned aerial vehicle expanding basic maneuver library;
FIG. 4 is a schematic diagram of a target defense model;
FIG. 5 is a maneuver decision flow according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart of the Minimax-DQN algorithm according to an embodiment of the present invention;
fig. 7 is a schematic composition diagram of a cooperative autonomous decision-making device for a multi-body game-based unmanned aerial vehicle fleet according to an embodiment of the present invention.
Detailed Description
Hereinafter, the cooperative autonomous decision method and apparatus for an unmanned aerial vehicle fleet according to the present invention will be described in detail with reference to the accompanying drawings and embodiments, but the present invention is not limited to these embodiments.
According to an embodiment of the invention, a cooperative autonomous decision-making method for a unmanned aerial vehicle cluster based on a multi-body game is disclosed, as shown in fig. 1, the method comprises the following steps:
step S1, constructing an confrontation model of the unmanned aerial vehicle cluster and the target;
the method specifically comprises the following steps:
step S11, constructing an unmanned aerial vehicle model;
the unmanned aerial vehicle adopts a three-degree-of-freedom particle motion equation, and the motion model is as follows:
Figure BDA0003412640210000051
Vy=Vsinθ
Figure BDA0003412640210000052
Figure BDA0003412640210000053
Figure BDA0003412640210000054
Figure BDA0003412640210000055
wherein theta is a track inclination angle,
Figure BDA0003412640210000056
for track deflection angle, V is unmanned aerial vehicle speed, Vx、Vy、VzRespectively, the velocity components in each direction, and x, y and z are centroid coordinates.
Step S12, constructing a target model;
the target adopts a three-degree-of-freedom particle motion equation, and the motion model is as follows:
Figure BDA0003412640210000061
Figure BDA0003412640210000062
Figure BDA0003412640210000063
Figure BDA0003412640210000064
in the formula (I), the compound is shown in the specification,
Figure BDA0003412640210000065
is the navigational angle, V is the target speed, Vx、VyRespectively, the velocity components in each direction, and x and y are centroid coordinates.
Step S13, acquiring relative situation parameters of the two confrontation parties;
fig. 2 is a schematic diagram of the confrontational situation between the unmanned aerial vehicle cluster and the target. In the figure, PiRepresents the ith unmanned plane, T represents the target, R represents the ith unmanned plane PiAnd the relative distance between the target T and the target,
Figure BDA0003412640210000066
indicates the ith unmanned plane PiQ denotes the ith drone PiI is 1,2,3 …, n is the number of drones.
In the invention, the parameters for representing the confrontation situation of the unmanned aerial vehicle cluster and the target comprise position coordinates, speed, relative distance, azimuth angle and target incidence angle of the two confrontation parties.
Step S14, constructing a maneuver library of the two countermeasures;
the maneuver library refers to a selectable set of maneuver decisions for the target in the countermeasure process, in which selectable maneuvers or sequences of maneuvers are contained.
For a drone, its basic maneuvers include: 1) maximum acceleration, 2) maximum deceleration, 3) maximum overload climbing, 4) maximum overload diving, 5) maximum overload left turning, 6) maximum overload right turning, and 7) stable flight (all control quantities are unchanged). On the basis, the invention considers the maneuvering direction and expands the maneuvering action against the unmanned aerial vehicle into 11 types, as shown in figure 3.
Assuming that the unmanned aerial vehicle performs uniform acceleration movement in a limited short time, the flight path is a small segment of circular arc, and the speed of the small segment of circular arc is only influenced by the acceleration alpha in the horizontal directioniAcceleration in the vertical direction betaiAnd lateral acceleration ρiThe control and magnitude remains the same and it is desirable to implement maneuvers with maximum overload in an attempt to complete the turning maneuver at a rapid rate.
Therefore, the relationship between the heading angle, the track angle and the controlled variable can be established, and a controlled variable set table of the 11 unmanned aerial vehicle maneuvers is obtained, as shown in the following table:
serial number Motorized mode Control quantity
1 Uniform velocity front fly F (0,0,0)
2 Fly before acceleration A imax,0,0)
3 Fly before deceleration S (-αimax,0,0)
4 Left turn L (0,0,ρimax)
5 Right turn R (0,0,-ρimax)
6 Climbing C (0,βimax,0)
7 Left climbing LC (0,βimaximax)
8 Right climbing RC (0,-βimaximax)
9 Dive D (0,-βimax,0)
10 Left diving LD (0,βimax,-ρimax)
11 Right dive RD (0,-βimax,-ρimax)
Wherein alpha isimaximaximaxRespectively, the maximum value of the acceleration in each direction.
For the target, the maneuvering actions are 5 in total, and are divided into uniform-speed advancing, accelerated advancing, decelerated advancing, left turning and right turning, and the movement of the target is only influenced by the horizontal direction alphaiAnd lateral acceleration ρiThe control and magnitude remains the same and it is desirable to implement maneuvers with maximum overload in an attempt to complete the turning maneuver at a rapid rate.
Therefore, the relationship between the course angle and the controlled variable can be established, and a controlled variable set table of the target maneuver can be obtained, as shown in the following table:
Figure BDA0003412640210000071
Figure BDA0003412640210000081
wherein alpha isimaximaxRespectively, the maximum value of the acceleration in each direction.
Step S15, constructing a maneuvering attacking and defending library of both confrontation parties;
taking a ship as an example, as shown in fig. 4, the target defense means includes: passive interference, such as foil cloud, which is emitted by the target and moves outwards integrally in a straight line at a constant speed; active interference can form an echo area with a radiation angle alpha, and the attack direction can be adjusted at a speed u; and dense array defense is positioned on two sides of the target, the attack angle is theta, and the attack direction can be adjusted by the speed v.
The details are shown in the following table:
Figure BDA0003412640210000082
unmanned aerial vehicle's defense strategy is mainly anti-interference strategy, includes: the RCS is changed through self intelligent skin, so that enemy detection is avoided; the interference bait is released, and when the interference bait is detected and countered by an enemy, the interference bait can be actively released to cause a false target to deceive the enemy; and planning the air route, and replanning the air route.
The details are shown in the following table:
Figure BDA0003412640210000083
Figure BDA0003412640210000091
the countermeasure between the unmanned aerial vehicle group and the target is a continuous maneuvering process, and the continuous maneuvering process is discretized from the perspective of game theory, namely the P (T) unmanned aerial vehicle makes judgment to maneuver based on a maneuvering action library and a maneuvering attack and defense library at decision time t according to the situation of the current stage, the maneuvering process continues to be delta t, and the maneuvering process enters the next stage, and is shown in fig. 5.
The maneuver and attack and defense strategies are made at the decision time and continue to the next decision time, so the three-dimensional maneuver and attack and defense strategies of the unmanned aerial vehicle can be converted into:
Figure BDA0003412640210000092
in the formula (I), the compound is shown in the specification,
Figure BDA0003412640210000093
for the situation information of the countermeasure node (drone) i at the current stage,
Figure BDA0003412640210000094
situation information after performing maneuver j for the next stage countermeasure node i.
And both the confrontation parties select the maneuver mode which can form the largest combat advantage in the next stage as the maneuver decision of the current stage, and then the confrontation game is formed until the confrontation of the locked counterpart is finished.
Step S2, constructing a random game model by taking the two confrontation parties as agents and taking the two-person zero-sum game as a condition;
the random game can be represented as a tuple (n, S, A)1,…An,T,γ,R1,…Rn) Wherein the elements contained are:
(1) the number n: indicating the number of players.
(2) And a state S: the state is a description of the environment, and after an agent makes an action, the state changes and the evolution is markov.
(3) Action A: an action is a description of the behavior of an agent and is the result of a decision. The motion space may be discrete or continuous.
(4) Transfer function T: by given player's current state s and one action A per agentiControl with transition probability of 0,1]In the meantime.
(5) Discount factor γ: the discount factor is the decay to the future reward, γ ∈ [0,1 ].
(6) A return function R: indicating that the designated player takes a joint action at state s (A)1,…An) The reward taken at state s'.
Each agent in the gaming environment is defined by a set of states S and a set of actions A1,…AkDefining, state transition from current state s and one action A per agentiControl, each agent has an associated reward function that attempts to maximize the sum of its expected discount rewards. The next player status and payback in the random game depends only on the current status and the current behavior of all players. Solving the random game requires finding a strategy pi that maximizes the future discount returns for players with a discount factor y.
The invention takes an unmanned aerial vehicle and a target confrontation both as intelligent agents and models the confrontation game under the condition of zero sum game of two persons.
Firstly, determining a state space S, an action space A and a reward function R required by each agent in a random game environment, and selecting an action A for the agent to decide for the current state SiThe next state s' is reached and the feedback reward r after interaction with the environment is obtained, and then the next round of interaction is performed, thus achieving a cycle.
1) The number n: the total number of players in the two-party confrontation, namely the unmanned aerial vehicle number and the target number.
2) And a state S: according to the factors influencing the situations of the two parties, the state characteristics of the two parties can be determined, and the state characteristics mainly comprise position coordinates (x, y, z), speed v, relative distance R and azimuth angle of the two parties in confrontation
Figure BDA0003412640210000101
And a target incident angle q.
The state space from which the game can be derived can be expressed as:
Figure BDA0003412640210000102
since the state space of the confrontation is a continuous infinite space, a learning method is required to deal with these features.
3) Action A: unmanned aerial vehicle 'S optional maneuver is 11 in total, is flying F at the uniform velocity before flying A with higher speed, flying S before the deceleration, turn left L, turn right R, climb C, climb left LC, climb right RC, dive D, dive left LD, dive right RD respectively, constructs discrete action space, then unmanned aerial vehicle' S action space ApF, a, S, L, R, C, LC, RC, D, LD, RD }. The selectable maneuvering actions of the target are 5 in total, namely uniform-speed advancing F, accelerated advancing A, decelerated advancing S, left turning L and right turning R are respectively adopted, and then the action space A of the target is obtainedT={F,A,S,L,R}。
4) Transfer function T: taking the unmanned aerial vehicle as an example, the probability that the current state s of the unmanned aerial vehicle is transferred to the next state s' under the influence of the joint behavior (a, o) of the action a selected by the unmanned aerial vehicle according to the policy and the action o selected by the opponent target.
5) Discount factor γ: the discount factor is chosen in [0,1], for example around 0.9.
6) A return function R: in the random game, Q (s, a, o) is used to indicate the expected award for the opponent to take action a and the adversary to take action o in state s. And setting the attack range reaching the unmanned aerial vehicle as a favorable situation according to the attack area of the unmanned aerial vehicle. For the reward value r of the unmanned aerial vehicle, if the unmanned aerial vehicle reaches the favorable situation, the return r is equal to 1, and if the enemy target reaches the favorable situation, the r is equal to-1.
In the countermeasure process, the situation parameters of the two parties mainly include the position coordinates, the speed, the relative distance, the azimuth angle and the target incident angle of the two parties, and then the advantage reward function is as follows:
Figure BDA0003412640210000111
in the formula, rpiTRepresenting unmanned plane piThe advantage situation reward with respect to the target T, Δ d represents the euclidean distance between the two parties, Δ h represents the height difference between the two parties,
Figure BDA0003412640210000112
representing unmanned plane piAzimuth angle with respect to target T, q denotes drone piThe target incident angle of.
For a multi-person random game, a return function and a transfer function are known, and a Nash equilibrium solution of the return function and the transfer function is expected to be obtained, namely a joint strategy of each agent, and a strategy of each agent is probability distribution of an action space. Since in a gaming environment, the expected payback is influenced by the opponent's strategy, the opponent's actions are generally unpredictable in a two-player confrontation game.
On the basis, the invention adopts a Minimax algorithm to select the optimal strategy of the random game. Assuming that the opponent has high decision-making capability, on the premise that the opponent selects the action which minimizes the income of the party, the party selects the action which maximizes the income of the party. The significance of the Minimax algorithm is therefore to obtain the maximum return in the worst case.
The random game value function represents the expected discount return obtained by the optimal strategy, and the state value function v(s) and the state action value function Q (s, a) are respectively:
Figure BDA0003412640210000121
where T (s, a, o, s ') represents the transition probability of state s reaching state s' through actions a and o.
From this, the optimum function v(s) in the random game state s can be expressed as:
Figure BDA0003412640210000122
according to the above formula, the optimal strategy pi and the optimal value function V in the state s can be obtained by using a linear programming constraint method.
The action state value function Q (s, a, o) for the unmanned aerial vehicle action a and the target action o in the state s is:
Figure BDA0003412640210000123
through the recursive equation, a converged optimal value function can be obtained through iteration, and an optimal strategy pi is obtained.
Because the two opponents of the game belong to the mixed strategy, that is, the two opponents of the game select a certain action not definitely, but have a selection probability for all actions, and the probability is the optimal strategy pi obtained by linear programming. Therefore, the invention adopts the roulette selection method to select the action, and the higher the fitness of the individual is, the higher the probability of selection is.
Step S3, solving the model by adopting deep reinforcement learning;
since the transfer function is difficult to determine in a game context, for a state transfer function T related to a conventional method for iteratively solving an MDP (Markov decision process) using a value, an asynchronous update mode Q-learning in reinforcement learning may be used instead.
Q-learning updates the current behavior value function with a time differential objective, with a reward r being updated each time state s takes action a to transition to state s':
Q(s,a)=r+γ·V(s′) (9)
since the probability of performing an update is exactly T (s, a, s'), the transfer function can be replaced. The Q-learning method is applied to the random game and can be converted into the following steps:
Qt(s,a,o)=(1-α)*Qt-1[s,a,o]+α*(r+γ·V[s′]) (10)
in the formula, α represents learning efficiency; r represents the currently earned reward; gamma represents a discount factor, which is the decay of future rewards.
Compared with the traditional Q-learning, the minimum-Q method is combined with the thought of the game theory, and the minimum value replaces the maximum value in the Q-learning to obtain the optimal strategy required under the game condition.
The states involved in the game of the two sides are continuous infinite space, so deep learning neural network processing characteristics are needed. Therefore, the Minimax-Q method can be further expanded, a deep neural network is added to approximate a value function, the learning process of reinforcement learning is trained by empirical playback, and an independent target network is arranged to process the time difference target.
The DQN approximates the linear function in Q-learning by nonlinear approximation in the form of neural network parameters, and high-dimensional nonlinear input data under the anti-game can be processed. The behavior value function of the DQN corresponds to a group of parameters, the weight of each layer of the network in the neural network is represented by theta, and the updating value function is actually used for updating the theta parameters.
Therefore, the invention stores the current state s obtained by interaction of the agent and the environment, the action a taken by the red party (unmanned aerial vehicle), the action o taken by the blue party (target), the corresponding reward value r and the next state s 'reached by the execution action as a quintuple { s, a, o, r, s' } in a memory base, randomly extracts data with a certain size from the memory base as a training sample, and calculates the target Q value to train the neural network. The manner of calculating the target Q value is the above equation (10).
As shown in fig. 6, the algorithm steps of Minimax-DQN include:
step S31: initializing, setting an initial state of both parties, initializing the memory bank, and setting an observation value.
Step S32: creating two neural networks, namely a Q network and a target network, wherein the Q network parameter is theta, and the target network parameter is theta-θ. The input of the neural network is state s, the output is action state value function Q, after learning for a certain number of times, the parameter of Q network is copiedAnd giving the target network.
Step S33: the following loop traversal process is performed:
s331: and the unmanned aerial vehicle intelligent body selects the action a according to the current state s and the strategy pi and executes the action a to obtain the next state s' and the obtained reward r. Observing the action o selected by the target intelligent agent in the state s, and storing the { s, a, o, r, s' } quintuple in a memory bank.
S332: part of the data is randomly drawn from the memory bank as a training sample. And taking the s ' value of the training sample as the input of the neural network, and obtaining Q [ s ' ] under the state s ' according to the output of the neural network.
S333: a minimax state value V [ s' ] is obtained by using linear programming according to the formula (8), and a target Q value target _ Q is calculated according to the formula (9).
S334: and calculating a loss function, optimizing by adopting a gradient descent method, and updating Q network parameters.
Step S34: and (4) carrying out linear programming solution by using the Q value output by the trained neural network according to the formula (6) to obtain an optimal strategy pi.
Therefore, the invention provides an intelligent decision solving method based on multi-body game and deep reinforcement learning for the autonomous decision problem of the unmanned aerial vehicle group under multi-target confrontation, an intelligent decision problem model based on multiple sets of games is constructed by establishing a two-party confrontation model, and continuous infinite space states involved in the two-party confrontation game are processed by adopting the deep reinforcement learning, so that a decision scheme is obtained, and the autonomous decision function of the unmanned aerial vehicle group on multi-target cooperative anti-interference means is realized.
Specifically, the cooperative anti-interference autonomous decision problem model is constructed by adopting a multi-body game, the target defense means is foil cloud launching, active interference and dense array defense, and the unmanned aerial vehicle defense means is RCS changing, interference active releasing, dragging bait releasing and route planning. And solving the cooperative anti-interference strategy selection of the unmanned aerial vehicle group to the multiple targets by adopting a Minimax-DQN solving algorithm model.
By adopting the invention, under the condition that the unmanned aerial vehicle group resists a plurality of target games, the unmanned aerial vehicle group can independently select a plurality of interference resisting means to the enemy by relying on the intelligent decision-making function based on multi-body games and deep reinforcement learning so as to adapt to the anti-interference application requirement of the unmanned aerial vehicle group in the resisting environment. Meanwhile, for defense means such as foil cloud launching, active interference and dense array defense adopted by enemy targets, the unmanned aerial vehicle cluster can independently select and change RCS, actively release interference, release dragging bait, route planning and other anti-interference means, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.
According to another aspect of the present invention, there is also disclosed a cooperative autonomous decision-making device for a drone swarm based on a multi-body game, as shown in fig. 7, including: a processor 401, a memory 402 and a program stored on the memory and executable on the processor, wherein the program, when executed by the processor, implements the drone formation cooperative guidance method according to the above scheme.
In addition, the invention also discloses a non-transitory readable storage medium, wherein the readable storage medium stores a program, and the program is executed by a processor to realize the unmanned aerial vehicle group collaborative autonomous decision-making method.
It should be understood that the processor mentioned in the embodiments of the present invention may be implemented by hardware or may be implemented by software. When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like. When implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory.
The processor may be, for example, a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be appreciated that the memory referred to in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (9)

1. A cooperative and autonomous decision-making method for an unmanned aerial vehicle cluster based on a multi-body game is characterized by comprising the following steps:
constructing an confrontation model of the unmanned aerial vehicle cluster and the target, wherein the confrontation model comprises a motion model of the unmanned aerial vehicle and the target, a maneuvering action library of both confrontation parties and a maneuvering attack and defense library;
taking the two confrontation parties as intelligent agents, and constructing a random game model under the condition of zero sum game of two persons;
and solving the random game model by adopting deep reinforcement learning to obtain an optimal strategy.
2. The unmanned aerial vehicle formation cooperative guidance method according to claim 1, wherein the motion models of the unmanned aerial vehicle and the target are respectively expressed by a particle motion equation, and the parameters for expressing the confrontation situation of the unmanned aerial vehicle cluster and the target comprise position coordinates, speed, relative distance, azimuth angle and target incidence angle of the confrontation parties.
3. The unmanned aerial vehicle formation cooperative guidance method according to claim 2, wherein in the random game model, the position of the two confrontation parties in the state SCoordinates (x, y, z), velocity v, relative distance R, azimuth
Figure FDA0003412640200000011
And a target angle of incidence q, expressed as:
Figure FDA0003412640200000012
4. the method for formation cooperative guidance of unmanned aerial vehicles according to claim 3, wherein in the random game model, the action space A of the unmanned aerial vehiclespComprising 11 actions, an action space A of the objectTIncluding 5 actions.
5. The unmanned aerial vehicle formation cooperative guidance method according to claim 2, wherein in the random game model, a dominant reward function is as follows:
Figure FDA0003412640200000013
in the formula (I), the compound is shown in the specification,
Figure FDA0003412640200000014
representing unmanned plane piA dominance situation award with respect to the target T, Δ d represents an euclidean distance between both parties, Δ h represents a height difference between both parties,
Figure FDA0003412640200000015
representing unmanned plane piAzimuth angle with respect to target T, q denotes drone piThe target incident angle of.
6. The method of claim 1, wherein a current state s, an action a taken by the drone, an action o taken by the target, a corresponding reward value r, and a next state s 'reached by the executed action are stored in a memory as a quintuple { s, a, o, r, s' }, and data of a certain size is randomly extracted from the memory as a training sample, and a target Q value is calculated to train the neural network.
7. The unmanned aerial vehicle formation cooperative guidance method according to claim 6, wherein the solving of the random game model by adopting deep reinforcement learning comprises the following steps:
step S31: setting an initial state of both sides, initializing a memory bank, and setting an observation value;
step S32: creating a Q network and a target network, wherein the Q network parameter is theta, and the target network parameter is theta-θ; the input of the neural network is a state s, the output is an action state value function Q, and after learning for a certain number of times, the parameters of the Q network are copied to a target network;
step S33: the following loop traversal process is performed:
s331: the unmanned aerial vehicle selects an action a according to the current state s and the strategy pi and executes the action a to obtain the next state s' and the obtained reward r; observing an action o selected by a target in a state s, and storing a { s, a, o, r, s' } quintuple in a memory bank;
s332: randomly extracting partial data from a memory base to serve as a training sample, taking the s ' value of the training sample as the input of a neural network, and obtaining Q [ s ' ] under the state s ' according to the output of the neural network;
s333: obtaining a minimax state value V [ s' ] by using linear programming, and calculating a target Q value target _ Q;
s334: calculating a loss function, optimizing by adopting a gradient descent method, and updating Q network parameters;
step S34: and performing linear programming solution by using the Q value output by the trained neural network to obtain an optimal strategy pi.
8. A cooperative autonomous decision-making device for unmanned aerial vehicle fleet based on multi-body game, comprising a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute the computer program to implement the cooperative guidance method for unmanned aerial vehicle formation according to any one of claims 1 to 7.
9. A non-transitory, storable medium storing a computer program that, when executed by a processor, implements the drone formation collaborative guidance method of any of claims 1-7.
CN202111534368.2A 2021-12-15 2021-12-15 Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game Pending CN114460959A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111534368.2A CN114460959A (en) 2021-12-15 2021-12-15 Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111534368.2A CN114460959A (en) 2021-12-15 2021-12-15 Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game

Publications (1)

Publication Number Publication Date
CN114460959A true CN114460959A (en) 2022-05-10

Family

ID=81405914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111534368.2A Pending CN114460959A (en) 2021-12-15 2021-12-15 Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game

Country Status (1)

Country Link
CN (1) CN114460959A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114727407A (en) * 2022-05-12 2022-07-08 中国科学院自动化研究所 Resource allocation method, device and equipment
CN114911269A (en) * 2022-06-17 2022-08-16 电子科技大学 Networking radar interference strategy generation method based on unmanned aerial vehicle cluster
CN115113642A (en) * 2022-06-02 2022-09-27 中国航空工业集团公司沈阳飞机设计研究所 Multi-unmanned aerial vehicle space-time key feature self-learning cooperative confrontation decision-making method
CN115877871A (en) * 2023-03-03 2023-03-31 北京航空航天大学 Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning
CN116795108A (en) * 2023-06-09 2023-09-22 西南交通大学 Intelligent unmanned vehicle distribution method based on multi-source sensing signals
CN117707219A (en) * 2024-02-05 2024-03-15 西安羚控电子科技有限公司 Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning
CN117806364B (en) * 2023-12-22 2024-05-28 华中科技大学 Fight learning architecture, control method and device for aircraft path tracking controller

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052511A (en) * 2020-06-15 2020-12-08 成都蓉奥科技有限公司 Air combat maneuver strategy generation technology based on deep random game

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052511A (en) * 2020-06-15 2020-12-08 成都蓉奥科技有限公司 Air combat maneuver strategy generation technology based on deep random game

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘钻东: "基于目标意图预测的多无人机协同攻防智能决策", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, pages 031 - 635 *
徐康发: "多机协同空战智能决策与评估方法", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, pages 032 - 16 *
马文,等: "基于深度随机博弈的近距空战机动决策", 系统工程与电子技术, no. 2, pages 443 - 451 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114727407A (en) * 2022-05-12 2022-07-08 中国科学院自动化研究所 Resource allocation method, device and equipment
CN114727407B (en) * 2022-05-12 2022-08-26 中国科学院自动化研究所 Resource allocation method, device and equipment
CN115113642A (en) * 2022-06-02 2022-09-27 中国航空工业集团公司沈阳飞机设计研究所 Multi-unmanned aerial vehicle space-time key feature self-learning cooperative confrontation decision-making method
CN114911269A (en) * 2022-06-17 2022-08-16 电子科技大学 Networking radar interference strategy generation method based on unmanned aerial vehicle cluster
CN115877871A (en) * 2023-03-03 2023-03-31 北京航空航天大学 Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning
CN116795108A (en) * 2023-06-09 2023-09-22 西南交通大学 Intelligent unmanned vehicle distribution method based on multi-source sensing signals
CN116795108B (en) * 2023-06-09 2023-12-01 西南交通大学 Intelligent unmanned vehicle distribution method based on multi-source sensing signals
CN117806364B (en) * 2023-12-22 2024-05-28 华中科技大学 Fight learning architecture, control method and device for aircraft path tracking controller
CN117707219A (en) * 2024-02-05 2024-03-15 西安羚控电子科技有限公司 Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning
CN117707219B (en) * 2024-02-05 2024-05-17 西安羚控电子科技有限公司 Unmanned aerial vehicle cluster investigation countermeasure method and device based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN114460959A (en) Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game
CN113589842B (en) Unmanned cluster task cooperation method based on multi-agent reinforcement learning
CN113791634B (en) Multi-agent reinforcement learning-based multi-machine air combat decision method
CN112783209B (en) Unmanned aerial vehicle cluster confrontation control method based on pigeon intelligent competition learning
CN115291625A (en) Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning
Wang et al. Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm
CN111240353A (en) Unmanned aerial vehicle collaborative air combat decision method based on genetic fuzzy tree
CN113741525B (en) Policy set-based MADDPG multi-unmanned aerial vehicle cooperative attack and defense countermeasure method
CN112198892B (en) Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method
CN111859541A (en) PMADDPG multi-unmanned aerial vehicle task decision method based on transfer learning improvement
Wang et al. UAV swarm confrontation using hierarchical multiagent reinforcement learning
CN112651486A (en) Method for improving convergence rate of MADDPG algorithm and application thereof
CN114721424A (en) Multi-unmanned aerial vehicle cooperative countermeasure method, system and storage medium
CN113222106A (en) Intelligent military chess deduction method based on distributed reinforcement learning
CN114510078A (en) Unmanned aerial vehicle maneuver evasion decision-making method based on deep reinforcement learning
CN114638339A (en) Intelligent agent task allocation method based on deep reinforcement learning
CN116700079A (en) Unmanned aerial vehicle countermeasure occupation maneuver control method based on AC-NFSP
Liu et al. Using CIGAR for finding effective group behaviors in RTS game
CN116225049A (en) Multi-unmanned plane wolf-crowd collaborative combat attack and defense decision algorithm
CN113741186B (en) Double-aircraft air combat decision-making method based on near-end strategy optimization
Hagelbäck Multi-agent potential field based architectures for real-time strategy game bots
CN114167899B (en) Unmanned plane bee colony collaborative countermeasure decision-making method and system
CN115859778A (en) Air combat maneuver decision method based on DCL-GWOO algorithm
CN115061495A (en) Unmanned aerial vehicle group confrontation autonomous control method based on eagle pigeon game
Liu et al. Multiagent reinforcement learning with regret matching for robot soccer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination