CN114460959A

CN114460959A - Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game

Info

Publication number: CN114460959A
Application number: CN202111534368.2A
Authority: CN
Inventors: 程进; 邹晓滢; 郝明瑞; 魏东辉
Original assignee: Beijing Electromechanical Engineering Research Institute
Current assignee: Beijing Electromechanical Engineering Research Institute
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-05-10

Abstract

The invention discloses a cooperative autonomous decision-making method and a cooperative autonomous decision-making device for an unmanned aerial vehicle group based on a multi-body game, wherein the method comprises the following steps: establishing an antagonistic model of the unmanned aerial vehicle group and the target, wherein the antagonistic model comprises a motion model of the unmanned aerial vehicle and the target, a maneuvering action library of both antagonistic parties and a maneuvering attack and defense library; two confrontation parties are used as intelligent agents, and a random game model is constructed under the condition of zero sum game of two persons; and solving the random game model by adopting deep reinforcement learning to obtain an optimal strategy. The invention can automatically select countermeasures against various interferences of enemies based on the intelligent decision-making functions of multi-body game and deep reinforcement learning so as to meet the anti-interference application requirements of the unmanned aerial vehicle cluster in the countermeasures environment. The unmanned aerial vehicle cluster can independently select corresponding anti-interference means against defense means adopted by an enemy target, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.

Description

Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game

Technical Field

The invention belongs to the technical field of aircraft control, and particularly relates to a cooperative autonomous decision-making method and device for an unmanned aerial vehicle group based on multi-body game.

Background

In order to realize the autonomous decision of the task execution means when the unmanned aerial vehicle cluster is in a complex countermeasure environment and multi-target countermeasures, the autonomous means can be selected as a task allocation problem. For example, in the cooperative anti-interference process, lateral maneuvering, electronic countermeasure, towing bait release and the like can be regarded as tasks, anti-interference tasks are distributed to unmanned aerial vehicle cluster resources, and a plurality of unmanned aerial vehicles are selected to execute different anti-interference tasks.

The traditional method constructs an autonomous decision problem as a task distribution oriented multi-traveler problem (MTSP), and a Mixed Integer Linear Programming (MILP) model is adopted to solve the autonomous decision problem model. In addition, in order to realize more reasonable task allocation, dynamic task time constraint and unmanned aerial vehicle task capability constraint can be introduced, an extended multi-machine cooperative task allocation model is established, and meanwhile, path and time optimization is carried out on MTSP, and an MTSP digital planning model is established. On the basis of the MILP model, the problem of multi-task allocation of heterogeneous multi-unmanned aerial vehicles can be added, and improved MILP modeling is achieved.

Algorithms for solving the task allocation problem based on the task model are mainly divided into an optimization method and a heuristic method. The Hungarian algorithm is the most common optimization method, and can be popularized to multi-target allocation. The heuristic method is a compromise between the time consumption and the quality of the solution, and aims to obtain a satisfactory solution of the problem within a certain time range. The colony algorithm, including ant colony algorithm, particle swarm algorithm and the like, is a heuristic method which is applied more at present, and the algorithms simulate the cluster behaviors of birds, insects and fishes in nature.

Disclosure of Invention

The invention aims to provide a cooperative autonomous decision-making method and device of an unmanned aerial vehicle cluster based on multi-body game, which are used for realizing game decision under incomplete information and meeting the dual requirements of autonomous decision-making function on robustness and instantaneity under cooperative tasks.

In order to achieve the purpose, the invention adopts the following technical scheme:

according to the 1 st aspect of the invention, the invention discloses a cooperative autonomous decision-making method for a unmanned aerial vehicle cluster based on a multi-body game, which comprises the following steps:

constructing an confrontation model of the unmanned aerial vehicle cluster and the target, wherein the confrontation model comprises a motion model of the unmanned aerial vehicle and the target, a maneuvering action library of both confrontation parties and a maneuvering attack and defense library;

taking the two confrontation parties as intelligent agents, and constructing a random game model under the condition of zero sum game of two persons;

and solving the random game model by adopting deep reinforcement learning to obtain an optimal strategy.

In some other examples, the motion models of the drone and the target are respectively expressed by a particle motion equation, and the parameters for representing the confrontation situation of the drone cluster and the target include position coordinates, speed, relative distance, azimuth angle, and target incident angle of the confrontation parties.

In some other examples, the random game model includes a state S consisting of location coordinates (x, y, z), velocity v, relative distance R, and azimuth of the opposing parties

And a target angle of incidence q, expressed as:

in some other examples, the random game model includes a space of motion a of the drone_pComprising 11 actions, an action space A of the object_TIncluding 5 actions.

In some other examples, the random gambling model wherein the advantage award function is:

in the formula, r_piTRepresenting unmanned plane p_iThe advantage situation reward with respect to the target T, Δ d represents the euclidean distance between the two parties, Δ h represents the height difference between the two parties,

representing unmanned plane p_iAzimuth angle with respect to target T, q denotes drone p_iThe target incident angle of.

In other examples, the current state s, the action a taken by the drone, the action o taken by the target, the corresponding reward value r, and the next state s 'reached by the executed action are stored as a quintuple { s, a, o, r, s' } in a memory, and data of a certain size is randomly extracted from the memory as training samples, and the target Q value is calculated to train the neural network.

In some other examples, the solving the random game model using deep reinforcement learning includes:

step S31: setting an initial state of both parties, initializing a memory bank, and setting an observed value;

step S32: creating a Q network and a target network, wherein the Q network parameter is theta, and the target network parameter is theta^-θ; the input of the neural network is a state s, the output is an action state value function Q, and after learning for a certain number of times, the parameters of the Q network are copied to a target network;

step S33: the following loop traversal process is performed:

s331: the unmanned aerial vehicle selects an action a according to the current state s and the strategy pi and executes the action a to obtain the next state s' and the obtained reward r; observing an action o selected by a target in a state s, and storing a { s, a, o, r, s' } quintuple in a memory bank;

s332: randomly extracting partial data from a memory base to serve as a training sample, taking the s ' value of the training sample as the input of a neural network, and obtaining Q [ s ' ] under the state s ' according to the output of the neural network;

s333: obtaining a minimax state value V [ s' ] by using linear programming, and calculating a target Q value target _ Q;

s334: calculating a loss function, optimizing by adopting a gradient descent method, and updating Q network parameters;

step S34: and performing linear programming solution by using the Q value output by the trained neural network to obtain an optimal strategy pi.

According to the 2 nd aspect of the invention, the invention discloses a cooperative autonomous decision-making device for a unmanned aerial vehicle cluster based on a multi-body game, which comprises: the unmanned aerial vehicle formation cooperative guidance system comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein when the program is executed by the processor, the unmanned aerial vehicle formation cooperative guidance method according to the scheme is realized.

According to the 3 rd aspect of the present invention, a non-transitory storable medium storing a computer program which, when executed by a processor, implements the method for formation and cooperative guidance of drones according to the above aspects is disclosed.

By adopting the invention, the unmanned aerial vehicle can independently select various countermeasures for various interferences of enemies based on the intelligent decision-making functions of multi-body game and deep reinforcement learning so as to adapt to the anti-interference application requirements of the unmanned aerial vehicle cluster in the countermeasures environment. The unmanned aerial vehicle cluster can independently select corresponding anti-interference means against defense means adopted by an enemy target, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.

Drawings

Fig. 1 is a schematic flow chart of a cooperative autonomous decision-making method for a multi-body game-based unmanned aerial vehicle cluster according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a relative situation relationship between the unmanned aerial vehicle and a target;

FIG. 3 is a schematic diagram of an unmanned aerial vehicle expanding basic maneuver library;

FIG. 4 is a schematic diagram of a target defense model;

FIG. 5 is a maneuver decision flow according to an embodiment of the present invention;

FIG. 6 is a schematic flow chart of the Minimax-DQN algorithm according to an embodiment of the present invention;

fig. 7 is a schematic composition diagram of a cooperative autonomous decision-making device for a multi-body game-based unmanned aerial vehicle fleet according to an embodiment of the present invention.

Detailed Description

Hereinafter, the cooperative autonomous decision method and apparatus for an unmanned aerial vehicle fleet according to the present invention will be described in detail with reference to the accompanying drawings and embodiments, but the present invention is not limited to these embodiments.

According to an embodiment of the invention, a cooperative autonomous decision-making method for a unmanned aerial vehicle cluster based on a multi-body game is disclosed, as shown in fig. 1, the method comprises the following steps:

step S1, constructing an confrontation model of the unmanned aerial vehicle cluster and the target;

the method specifically comprises the following steps:

step S11, constructing an unmanned aerial vehicle model;

the unmanned aerial vehicle adopts a three-degree-of-freedom particle motion equation, and the motion model is as follows:

V_y＝Vsinθ

wherein theta is a track inclination angle,

for track deflection angle, V is unmanned aerial vehicle speed, V_x、V_y、V_zRespectively, the velocity components in each direction, and x, y and z are centroid coordinates.

Step S12, constructing a target model;

the target adopts a three-degree-of-freedom particle motion equation, and the motion model is as follows:

in the formula (I), the compound is shown in the specification,

is the navigational angle, V is the target speed, V_x、V_yRespectively, the velocity components in each direction, and x and y are centroid coordinates.

Step S13, acquiring relative situation parameters of the two confrontation parties;

fig. 2 is a schematic diagram of the confrontational situation between the unmanned aerial vehicle cluster and the target. In the figure, P_iRepresents the ith unmanned plane, T represents the target, R represents the ith unmanned plane P_iAnd the relative distance between the target T and the target,

indicates the ith unmanned plane P_iQ denotes the ith drone P_iI is 1,2,3 …, n is the number of drones.

In the invention, the parameters for representing the confrontation situation of the unmanned aerial vehicle cluster and the target comprise position coordinates, speed, relative distance, azimuth angle and target incidence angle of the two confrontation parties.

Step S14, constructing a maneuver library of the two countermeasures;

the maneuver library refers to a selectable set of maneuver decisions for the target in the countermeasure process, in which selectable maneuvers or sequences of maneuvers are contained.

For a drone, its basic maneuvers include: 1) maximum acceleration, 2) maximum deceleration, 3) maximum overload climbing, 4) maximum overload diving, 5) maximum overload left turning, 6) maximum overload right turning, and 7) stable flight (all control quantities are unchanged). On the basis, the invention considers the maneuvering direction and expands the maneuvering action against the unmanned aerial vehicle into 11 types, as shown in figure 3.

Assuming that the unmanned aerial vehicle performs uniform acceleration movement in a limited short time, the flight path is a small segment of circular arc, and the speed of the small segment of circular arc is only influenced by the acceleration alpha in the horizontal direction_iAcceleration in the vertical direction beta_iAnd lateral acceleration ρ_iThe control and magnitude remains the same and it is desirable to implement maneuvers with maximum overload in an attempt to complete the turning maneuver at a rapid rate.

Therefore, the relationship between the heading angle, the track angle and the controlled variable can be established, and a controlled variable set table of the 11 unmanned aerial vehicle maneuvers is obtained, as shown in the following table:

serial number	Motorized mode	Control quantity
			1	Uniform velocity front fly F	(0,0,0)
2	Fly before acceleration A	(α_imax,0,0)
			3	Fly before deceleration S	(-α_imax,0,0)
4	Left turn L	(0,0,ρ_imax)
			5	Right turn R	(0,0,-ρ_imax)
6	Climbing C	(0,β_imax,0)
			7	Left climbing LC	(0,β_imax,ρ_imax)
8	Right climbing RC	(0,-β_imax,ρ_imax)
			9	Dive D	(0,-β_imax,0)
10	Left diving LD	(0,β_imax,-ρ_imax)
			11	Right dive RD	(0,-β_imax,-ρ_imax)

Wherein alpha is_imax,β_imax,ρ_imaxRespectively, the maximum value of the acceleration in each direction.

For the target, the maneuvering actions are 5 in total, and are divided into uniform-speed advancing, accelerated advancing, decelerated advancing, left turning and right turning, and the movement of the target is only influenced by the horizontal direction alpha_iAnd lateral acceleration ρ_iThe control and magnitude remains the same and it is desirable to implement maneuvers with maximum overload in an attempt to complete the turning maneuver at a rapid rate.

Therefore, the relationship between the course angle and the controlled variable can be established, and a controlled variable set table of the target maneuver can be obtained, as shown in the following table:

wherein alpha is_imax,ρ_imaxRespectively, the maximum value of the acceleration in each direction.

Step S15, constructing a maneuvering attacking and defending library of both confrontation parties;

taking a ship as an example, as shown in fig. 4, the target defense means includes: passive interference, such as foil cloud, which is emitted by the target and moves outwards integrally in a straight line at a constant speed; active interference can form an echo area with a radiation angle alpha, and the attack direction can be adjusted at a speed u; and dense array defense is positioned on two sides of the target, the attack angle is theta, and the attack direction can be adjusted by the speed v.

The details are shown in the following table:

unmanned aerial vehicle's defense strategy is mainly anti-interference strategy, includes: the RCS is changed through self intelligent skin, so that enemy detection is avoided; the interference bait is released, and when the interference bait is detected and countered by an enemy, the interference bait can be actively released to cause a false target to deceive the enemy; and planning the air route, and replanning the air route.

The details are shown in the following table:

the countermeasure between the unmanned aerial vehicle group and the target is a continuous maneuvering process, and the continuous maneuvering process is discretized from the perspective of game theory, namely the P (T) unmanned aerial vehicle makes judgment to maneuver based on a maneuvering action library and a maneuvering attack and defense library at decision time t according to the situation of the current stage, the maneuvering process continues to be delta t, and the maneuvering process enters the next stage, and is shown in fig. 5.

The maneuver and attack and defense strategies are made at the decision time and continue to the next decision time, so the three-dimensional maneuver and attack and defense strategies of the unmanned aerial vehicle can be converted into:

in the formula (I), the compound is shown in the specification,

for the situation information of the countermeasure node (drone) i at the current stage,

situation information after performing maneuver j for the next stage countermeasure node i.

And both the confrontation parties select the maneuver mode which can form the largest combat advantage in the next stage as the maneuver decision of the current stage, and then the confrontation game is formed until the confrontation of the locked counterpart is finished.

Step S2, constructing a random game model by taking the two confrontation parties as agents and taking the two-person zero-sum game as a condition;

the random game can be represented as a tuple (n, S, A)₁,…A_n,T,γ,R₁,…R_n) Wherein the elements contained are:

(1) the number n: indicating the number of players.

(2) And a state S: the state is a description of the environment, and after an agent makes an action, the state changes and the evolution is markov.

(3) Action A: an action is a description of the behavior of an agent and is the result of a decision. The motion space may be discrete or continuous.

(4) Transfer function T: by given player's current state s and one action A per agent_iControl with transition probability of 0,1]In the meantime.

(5) Discount factor γ: the discount factor is the decay to the future reward, γ ∈ [0,1 ].

(6) A return function R: indicating that the designated player takes a joint action at state s (A)₁,…A_n) The reward taken at state s'.

Each agent in the gaming environment is defined by a set of states S and a set of actions A₁,…A_kDefining, state transition from current state s and one action A per agent_iControl, each agent has an associated reward function that attempts to maximize the sum of its expected discount rewards. The next player status and payback in the random game depends only on the current status and the current behavior of all players. Solving the random game requires finding a strategy pi that maximizes the future discount returns for players with a discount factor y.

The invention takes an unmanned aerial vehicle and a target confrontation both as intelligent agents and models the confrontation game under the condition of zero sum game of two persons.

Firstly, determining a state space S, an action space A and a reward function R required by each agent in a random game environment, and selecting an action A for the agent to decide for the current state S_iThe next state s' is reached and the feedback reward r after interaction with the environment is obtained, and then the next round of interaction is performed, thus achieving a cycle.

1) The number n: the total number of players in the two-party confrontation, namely the unmanned aerial vehicle number and the target number.

2) And a state S: according to the factors influencing the situations of the two parties, the state characteristics of the two parties can be determined, and the state characteristics mainly comprise position coordinates (x, y, z), speed v, relative distance R and azimuth angle of the two parties in confrontation

And a target incident angle q.

The state space from which the game can be derived can be expressed as:

since the state space of the confrontation is a continuous infinite space, a learning method is required to deal with these features.

3) Action A: unmanned aerial vehicle 'S optional maneuver is 11 in total, is flying F at the uniform velocity before flying A with higher speed, flying S before the deceleration, turn left L, turn right R, climb C, climb left LC, climb right RC, dive D, dive left LD, dive right RD respectively, constructs discrete action space, then unmanned aerial vehicle' S action space A_pF, a, S, L, R, C, LC, RC, D, LD, RD }. The selectable maneuvering actions of the target are 5 in total, namely uniform-speed advancing F, accelerated advancing A, decelerated advancing S, left turning L and right turning R are respectively adopted, and then the action space A of the target is obtained_T＝{F,A,S,L,R}。

4) Transfer function T: taking the unmanned aerial vehicle as an example, the probability that the current state s of the unmanned aerial vehicle is transferred to the next state s' under the influence of the joint behavior (a, o) of the action a selected by the unmanned aerial vehicle according to the policy and the action o selected by the opponent target.

5) Discount factor γ: the discount factor is chosen in [0,1], for example around 0.9.

6) A return function R: in the random game, Q (s, a, o) is used to indicate the expected award for the opponent to take action a and the adversary to take action o in state s. And setting the attack range reaching the unmanned aerial vehicle as a favorable situation according to the attack area of the unmanned aerial vehicle. For the reward value r of the unmanned aerial vehicle, if the unmanned aerial vehicle reaches the favorable situation, the return r is equal to 1, and if the enemy target reaches the favorable situation, the r is equal to-1.

In the countermeasure process, the situation parameters of the two parties mainly include the position coordinates, the speed, the relative distance, the azimuth angle and the target incident angle of the two parties, and then the advantage reward function is as follows:

For a multi-person random game, a return function and a transfer function are known, and a Nash equilibrium solution of the return function and the transfer function is expected to be obtained, namely a joint strategy of each agent, and a strategy of each agent is probability distribution of an action space. Since in a gaming environment, the expected payback is influenced by the opponent's strategy, the opponent's actions are generally unpredictable in a two-player confrontation game.

On the basis, the invention adopts a Minimax algorithm to select the optimal strategy of the random game. Assuming that the opponent has high decision-making capability, on the premise that the opponent selects the action which minimizes the income of the party, the party selects the action which maximizes the income of the party. The significance of the Minimax algorithm is therefore to obtain the maximum return in the worst case.

The random game value function represents the expected discount return obtained by the optimal strategy, and the state value function v(s) and the state action value function Q (s, a) are respectively:

where T (s, a, o, s ') represents the transition probability of state s reaching state s' through actions a and o.

From this, the optimum function v(s) in the random game state s can be expressed as:

according to the above formula, the optimal strategy pi and the optimal value function V in the state s can be obtained by using a linear programming constraint method.

The action state value function Q (s, a, o) for the unmanned aerial vehicle action a and the target action o in the state s is:

through the recursive equation, a converged optimal value function can be obtained through iteration, and an optimal strategy pi is obtained.

Because the two opponents of the game belong to the mixed strategy, that is, the two opponents of the game select a certain action not definitely, but have a selection probability for all actions, and the probability is the optimal strategy pi obtained by linear programming. Therefore, the invention adopts the roulette selection method to select the action, and the higher the fitness of the individual is, the higher the probability of selection is.

Step S3, solving the model by adopting deep reinforcement learning;

since the transfer function is difficult to determine in a game context, for a state transfer function T related to a conventional method for iteratively solving an MDP (Markov decision process) using a value, an asynchronous update mode Q-learning in reinforcement learning may be used instead.

Q-learning updates the current behavior value function with a time differential objective, with a reward r being updated each time state s takes action a to transition to state s':

Q(s,a)＝r+γ·V(s′) (9)

since the probability of performing an update is exactly T (s, a, s'), the transfer function can be replaced. The Q-learning method is applied to the random game and can be converted into the following steps:

Q_t(s,a,o)＝(1-α)*Q_t-1[s,a,o]+α*(r+γ·V[s′]) (10)

in the formula, α represents learning efficiency; r represents the currently earned reward; gamma represents a discount factor, which is the decay of future rewards.

Compared with the traditional Q-learning, the minimum-Q method is combined with the thought of the game theory, and the minimum value replaces the maximum value in the Q-learning to obtain the optimal strategy required under the game condition.

The states involved in the game of the two sides are continuous infinite space, so deep learning neural network processing characteristics are needed. Therefore, the Minimax-Q method can be further expanded, a deep neural network is added to approximate a value function, the learning process of reinforcement learning is trained by empirical playback, and an independent target network is arranged to process the time difference target.

The DQN approximates the linear function in Q-learning by nonlinear approximation in the form of neural network parameters, and high-dimensional nonlinear input data under the anti-game can be processed. The behavior value function of the DQN corresponds to a group of parameters, the weight of each layer of the network in the neural network is represented by theta, and the updating value function is actually used for updating the theta parameters.

Therefore, the invention stores the current state s obtained by interaction of the agent and the environment, the action a taken by the red party (unmanned aerial vehicle), the action o taken by the blue party (target), the corresponding reward value r and the next state s 'reached by the execution action as a quintuple { s, a, o, r, s' } in a memory base, randomly extracts data with a certain size from the memory base as a training sample, and calculates the target Q value to train the neural network. The manner of calculating the target Q value is the above equation (10).

As shown in fig. 6, the algorithm steps of Minimax-DQN include:

step S31: initializing, setting an initial state of both parties, initializing the memory bank, and setting an observation value.

Step S32: creating two neural networks, namely a Q network and a target network, wherein the Q network parameter is theta, and the target network parameter is theta^-θ. The input of the neural network is state s, the output is action state value function Q, after learning for a certain number of times, the parameter of Q network is copiedAnd giving the target network.

Step S33: the following loop traversal process is performed:

s331: and the unmanned aerial vehicle intelligent body selects the action a according to the current state s and the strategy pi and executes the action a to obtain the next state s' and the obtained reward r. Observing the action o selected by the target intelligent agent in the state s, and storing the { s, a, o, r, s' } quintuple in a memory bank.

S332: part of the data is randomly drawn from the memory bank as a training sample. And taking the s ' value of the training sample as the input of the neural network, and obtaining Q [ s ' ] under the state s ' according to the output of the neural network.

S333: a minimax state value V [ s' ] is obtained by using linear programming according to the formula (8), and a target Q value target _ Q is calculated according to the formula (9).

S334: and calculating a loss function, optimizing by adopting a gradient descent method, and updating Q network parameters.

Step S34: and (4) carrying out linear programming solution by using the Q value output by the trained neural network according to the formula (6) to obtain an optimal strategy pi.

Therefore, the invention provides an intelligent decision solving method based on multi-body game and deep reinforcement learning for the autonomous decision problem of the unmanned aerial vehicle group under multi-target confrontation, an intelligent decision problem model based on multiple sets of games is constructed by establishing a two-party confrontation model, and continuous infinite space states involved in the two-party confrontation game are processed by adopting the deep reinforcement learning, so that a decision scheme is obtained, and the autonomous decision function of the unmanned aerial vehicle group on multi-target cooperative anti-interference means is realized.

Specifically, the cooperative anti-interference autonomous decision problem model is constructed by adopting a multi-body game, the target defense means is foil cloud launching, active interference and dense array defense, and the unmanned aerial vehicle defense means is RCS changing, interference active releasing, dragging bait releasing and route planning. And solving the cooperative anti-interference strategy selection of the unmanned aerial vehicle group to the multiple targets by adopting a Minimax-DQN solving algorithm model.

By adopting the invention, under the condition that the unmanned aerial vehicle group resists a plurality of target games, the unmanned aerial vehicle group can independently select a plurality of interference resisting means to the enemy by relying on the intelligent decision-making function based on multi-body games and deep reinforcement learning so as to adapt to the anti-interference application requirement of the unmanned aerial vehicle group in the resisting environment. Meanwhile, for defense means such as foil cloud launching, active interference and dense array defense adopted by enemy targets, the unmanned aerial vehicle cluster can independently select and change RCS, actively release interference, release dragging bait, route planning and other anti-interference means, so that a cooperative anti-interference countermeasure function is realized, and the task execution capacity, the survival capacity and the cooperative countermeasure efficiency of the intelligent unmanned aerial vehicle cluster are improved.

According to another aspect of the present invention, there is also disclosed a cooperative autonomous decision-making device for a drone swarm based on a multi-body game, as shown in fig. 7, including: a processor 401, a memory 402 and a program stored on the memory and executable on the processor, wherein the program, when executed by the processor, implements the drone formation cooperative guidance method according to the above scheme.

In addition, the invention also discloses a non-transitory readable storage medium, wherein the readable storage medium stores a program, and the program is executed by a processor to realize the unmanned aerial vehicle group collaborative autonomous decision-making method.

It should be understood that the processor mentioned in the embodiments of the present invention may be implemented by hardware or may be implemented by software. When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like. When implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory.

The processor may be, for example, a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be appreciated that the memory referred to in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A cooperative and autonomous decision-making method for an unmanned aerial vehicle cluster based on a multi-body game is characterized by comprising the following steps:

2. The unmanned aerial vehicle formation cooperative guidance method according to claim 1, wherein the motion models of the unmanned aerial vehicle and the target are respectively expressed by a particle motion equation, and the parameters for expressing the confrontation situation of the unmanned aerial vehicle cluster and the target comprise position coordinates, speed, relative distance, azimuth angle and target incidence angle of the confrontation parties.

3. The unmanned aerial vehicle formation cooperative guidance method according to claim 2, wherein in the random game model, the position of the two confrontation parties in the state SCoordinates (x, y, z), velocity v, relative distance R, azimuth

And a target angle of incidence q, expressed as:

4. the method for formation cooperative guidance of unmanned aerial vehicles according to claim 3, wherein in the random game model, the action space A of the unmanned aerial vehicles_pComprising 11 actions, an action space A of the object_TIncluding 5 actions.

5. The unmanned aerial vehicle formation cooperative guidance method according to claim 2, wherein in the random game model, a dominant reward function is as follows:

in the formula (I), the compound is shown in the specification,

representing unmanned plane p_iA dominance situation award with respect to the target T, Δ d represents an euclidean distance between both parties, Δ h represents a height difference between both parties,

6. The method of claim 1, wherein a current state s, an action a taken by the drone, an action o taken by the target, a corresponding reward value r, and a next state s 'reached by the executed action are stored in a memory as a quintuple { s, a, o, r, s' }, and data of a certain size is randomly extracted from the memory as a training sample, and a target Q value is calculated to train the neural network.

7. The unmanned aerial vehicle formation cooperative guidance method according to claim 6, wherein the solving of the random game model by adopting deep reinforcement learning comprises the following steps:

step S31: setting an initial state of both sides, initializing a memory bank, and setting an observation value;

step S33: the following loop traversal process is performed:

8. A cooperative autonomous decision-making device for unmanned aerial vehicle fleet based on multi-body game, comprising a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute the computer program to implement the cooperative guidance method for unmanned aerial vehicle formation according to any one of claims 1 to 7.

9. A non-transitory, storable medium storing a computer program that, when executed by a processor, implements the drone formation collaborative guidance method of any of claims 1-7.