CN116360503B

CN116360503B - Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Info

Publication number: CN116360503B
Application number: CN202310628021.7A
Authority: CN
Inventors: 刘昊; 吕金虎; 王新迪; 高庆; 刘德元; 钟森
Original assignee: Beihang University; Academy of Mathematics and Systems Science of CAS
Current assignee: Beihang University; Academy of Mathematics and Systems Science of CAS
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-10-13
Anticipated expiration: 2043-05-31
Also published as: CN116360503A

Abstract

This application provides a UAV game confrontation strategy generation method, system and electronic equipment, involving the field of aircraft control technology. The method includes the trajectory prediction results of all target blue side UAVs, all target blue side UAVs The prediction results of the attack target of the aircraft are input into the pre-trained intention interpretation model to output the blue team drone cluster status; the blue team drone cluster status, the red team drone at the current moment, and the target blue team drone are The relative position information between the red drone and other red drones at the current moment, and the motion status of the red drone at the current moment are input into the pre-trained cluster mean field random game. model to output the preferred action of the red drone, and control the red drone to move according to the determined preferred action, so as to improve the accuracy of generating defensive strategies for drone games.

Description

Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Technical Field

The application relates to the technical field of aircraft control, in particular to a method, a system and electronic equipment for generating a game countermeasure strategy of an unmanned aerial vehicle.

Background

The unmanned aerial vehicle game defense strategy autonomous generation technology refers to a technology for autonomously generating a game strategy based on battlefield situation and perceived information of both parties of a enemy in an operational environment by an unmanned aerial vehicle cluster so as to realize the aims of resisting the operational intention of the enemy, protecting the ground targets of the enemy and achieving the operational purpose of the enemy. In the prior art, the existing policy generation method has lower decision accuracy when the enemy unmanned aerial vehicle cluster has deception and false action scenes, so that a policy generation algorithm with higher decision accuracy is needed.

Disclosure of Invention

Therefore, the application aims to provide a method, a system and electronic equipment for generating an unmanned aerial vehicle game countermeasure strategy so as to improve the accuracy of unmanned aerial vehicle game defending strategy generation.

In a first aspect, the present application provides a method for generating a game countermeasure policy for an unmanned aerial vehicle, where the method includes: for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes: acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle; inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.

Preferably, the clustered average field random game model outputs the preferred actions for each red-square drone by: determining the action space of the red unmanned aerial vehicle cluster from a game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster; determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment; the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment are taken as independent variables, the execution action of the red unmanned aerial vehicle is taken as a dependent variable, and the execution action meeting Nash equilibrium conditions in the action space of the red unmanned aerial vehicle cluster is solved to be taken as the preferable action of the red unmanned aerial vehicle.

Preferably, the state of the blue unmanned aerial vehicle cluster at least comprises a formation, a grouping and a combat mode, and the step of determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster specifically comprises the following steps: according to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from a game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute the action according to time sequence arrangement; and determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.

Preferably, the relative position information includes a line of sight angle between two unmanned aerial vehicles, an entry angle between a speed vector of a target unmanned aerial vehicle and the line of sight, an included angle between speeds of the two unmanned aerial vehicles, a distance between the two unmanned aerial vehicles, and a relative speed between the two unmanned aerial vehicles, and the markov transition probability distribution of the red unmanned aerial vehicle is determined by:

the total potential field energy of the red unmanned aerial vehicle is determined by the following formula：

；

wherein For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->The angle between them cooperates with the potential field,for this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->The distance between them cooperates with the potential field->For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->A speed synergy potential field between->For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>An angular potential field between +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>Distance between the power fields, +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>A velocity power potential field between +.>Is in combination with red square unmanned aerial vehicle->Corresponding red unmanned aerial vehicle set, +.>Is in combination with red square unmanned aerial vehicle->A corresponding blue-square unmanned aerial vehicle set;

determining the Markov transition probability distribution of the red unmanned aerial vehicle through the following formula ：

；

wherein ,for this red square unmanned aerial vehicle +.>Current time->Is in the state of motion->For this red square unmanned aerial vehicle +.>Current time->Is performed by the processor.

Preferably, the step of solving the execution action of the red unmanned aerial vehicle, which satisfies the nash equilibrium condition in the action space, as the preferred action of the red unmanned aerial vehicle by using the markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, specifically includes:

determining the execution action meeting the Nash equilibrium condition as the preferable action of the red unmanned aerial vehicle through the following formula:

wherein ,for this red square unmanned aerial vehicle +.>Is>For this red square unmanned aerial vehicle +.>Is the preferred action of (a), discount rate，/>Is an action space.

Preferably, the method further comprises the step of storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle in a game countermeasure mechanism library for optimizing and updating an intention interpretation model and a cluster average field random game model.

Preferably, the intent interpretation model is generated by training in the following way: acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a plurality of sample blue unmanned aerial vehicle track sequences and corresponding blue unmanned aerial vehicle cluster states; the method comprises the steps of constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer, the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical quantity of combat modes, each fuzzy node corresponds to a different membership function, and the membership function has a formula as follows:

，

wherein ，/>For input node in input layer->Corresponding track sequence,/->For input node->Connected fuzzy node +.>Corresponding membership function, +.>For the first target parameter, +.>For the second target parameter, the fuzzy inference layer comprises a plurality of inference nodes, and the calculation rule formula of each inference node is as follows:

，

the output layer comprises a plurality of output nodes, and the definition function formula of each output node is as follows:

，

wherein ,is a third target parameter; and inputting the training data set into the constructed target fuzzy neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target fuzzy neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire a pre-trained intention interpretation model.

In a second aspect, the present application provides an unmanned aerial vehicle game countermeasure policy generation system, the system comprising: the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and comprises: the track prediction unit is used for acquiring a history track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, inputting a pre-trained track prediction model, and outputting a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; the attack target prediction unit is used for inputting track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue unmanned aerial vehicle; the intention interpretation unit is used for inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; the game countermeasure unit is used for inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.

In a third aspect, the present application also provides an electronic device, including: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device is running, and the machine-readable instructions are executed by the processor to perform the steps of generating the game countermeasure strategy of the unmanned plane.

In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of generating a drone game countermeasure strategy as described above.

The application provides a method for generating a game countermeasure strategy of an unmanned aerial vehicle, which comprises the following steps of determining the execution action of each unmanned aerial vehicle of the red party at the next moment and controlling the unmanned aerial vehicle of the red party to move according to the determined execution action by aiming at each unmanned aerial vehicle of the red party: acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle; inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range, behaviors and intentions of the blue unmanned aerial vehicle are analyzed through deep learning before strategy generation, game countermeasure strategies of the red unmanned aerial vehicle are dynamically generated and adjusted, and timeliness of accuracy of decisions is improved.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for generating a game countermeasure policy for an unmanned aerial vehicle according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating steps for determining a preferred action according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a scenario for generating a game countermeasure strategy according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a first unmanned aerial vehicle game countermeasure simulation according to an embodiment of the present application;

fig. 5 is a schematic diagram of a game countermeasure simulation of a second unmanned aerial vehicle according to an embodiment of the present application;

fig. 6 is a schematic diagram of a change in the number of unmanned aerial vehicles according to an embodiment of the present application;

Fig. 7 is a block diagram of an unmanned plane game countermeasure policy generation system according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment obtained by a person skilled in the art without making any inventive effort falls within the scope of protection of the present application.

First, an application scenario to which the present application is applicable will be described. The method can be applied to game countermeasure strategy generation of unmanned aerial vehicle cooperative combat, and is particularly suitable for control strategy generation of the rotor unmanned aerial vehicle.

The research shows that the unmanned aerial vehicle game defense strategy autonomous generation technology refers to a technology for automatically generating a game strategy based on battlefield situation and perceived information of both sides of a friend or foe in an operational environment by an unmanned aerial vehicle cluster so as to realize the aims of resisting the intention of the foe to combat, protecting the ground targets of the friend or foe and achieving the aim of the foe to combat. In the prior art, lei et al, for example, put forward an optimal strategy based on complete information and Markov, so as to realize attack and defense for a moving target; carter et al considers the dynamic conversion of game models under different attack scenarios, thereby proposing a strategy generation algorithm; the Garcia et al models the unmanned aerial vehicle cluster game problem as a differential game problem of cluster hitting, and gives out the unmanned aerial vehicle cluster anti-hitting guidance law under the scene by establishing a whole process performance function and giving out a hitting capability evaluation function.

Based on the method, the system and the electronic equipment for generating the game countermeasure strategy of the unmanned aerial vehicle are provided by the embodiment of the application, so that the accuracy of generating the game defending strategy of the unmanned aerial vehicle is improved.

Referring to fig. 1 and 3, fig. 1 is a flowchart of a method for generating a game countermeasure policy for an unmanned aerial vehicle according to an embodiment of the present application, and fig. 3 is a schematic diagram of a scenario for generating a game countermeasure policy according to an embodiment of the present application. As shown in fig. 1, the method for generating the game countermeasure policy of the unmanned aerial vehicle provided by the embodiment of the application includes:

for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes:

here, in one embodiment of the present application, the execution of the red unmanned aerial vehicle is dynamically transformed, that is, the execution of each red unmanned aerial vehicle at the next moment is determined based on the historical data collected by the red unmanned aerial vehicle. The historical data here includes, but is not limited to, the status and scene information of the blue unmanned aerial vehicle collected by the red unmanned aerial vehicle over a period of time. Actions herein include, but are not limited to, speed, pose, whether to attack, etc. of the drone.

S101, acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle.

In step S101, a trajectory prediction model is trained and generated based on an LSTM (Long-Short Time Memory Neural Networks, long-short-term memory neural network) model. The track prediction model is input into a track sequence of each target blue unmanned aerial vehicle, the target blue unmanned aerial vehicle refers to any blue unmanned aerial vehicle monitored in the sensing range of the red unmanned aerial vehicle, and the history track sequence can be continuous or discrete, such as continuous three-dimensional position coordinates of the target blue unmanned aerial vehicle in the past 10 seconds, or three-dimensional position coordinates corresponding to the 8 th second, the 5 th second and the 3 rd second of the target blue unmanned aerial vehicle. The output of the track prediction model is a predicted track sequence corresponding to each input target blue unmanned aerial vehicle, for example, three-dimensional position coordinates within 5 seconds of the target blue unmanned aerial vehicle or three-dimensional position coordinates within 3 seconds of the target blue unmanned aerial vehicle, and the shorter the length of the predicted track sequence, the more beneficial to the prediction in the step S102.

S102, inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle.

The attack target prediction model is trained and generated by adopting an IDCNN (Iterated Dilated Convolutional Neural Networks) model and an iterative cavity convolution neural network. The input of the attack target prediction model is a track prediction result of each target blue unmanned aerial vehicle output by the track prediction model. The output of the attack target prediction model is the attack target prediction result of each target blue unmanned aerial vehicle, for example, the attack target of the target blue unmanned aerial vehicle A is the red unmanned aerial vehicle B, the attack target of the target blue unmanned aerial vehicle C is the red ground target D, and the like. The attack target at least comprises a red unmanned aerial vehicle, a red ground target or an unmanned aerial vehicle.

S103, inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state.

The intent interpretation model is generated by training with a fuzzy neural network model. The input of the intention interpretation model is a behavior sequence of each target blue unmanned aerial vehicle determined by the output results of the track prediction model and the attack target prediction model. The output of the intent interpretation model is the cluster state of the blue unmanned aerial vehicle, wherein the cluster state at least comprises information such as formation, grouping, combat mode and the like, for example, formation of a first formation grouping to hit the ground target on my side, formation of a second formation cluster to detect the information on my side, formation of a third formation cluster to interfere with the unmanned aerial vehicle B on my side and the like.

S104, inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a cluster average field random game model trained in advance so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion.

The target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.

The cluster mean field random game model is generated by training based on reinforcement learning by an expert knowledge-aided mean field cluster game strategy generation algorithm. Inputs of the cluster average field random game model comprise enemy situation (relative geometric information between two machines), enemy unmanned aerial vehicle cluster state and the like. The output of the cluster average field random game model is the preferred action to be executed by the red unmanned aerial vehicle at the next moment.

As shown in fig. 2, fig. 2 is a flowchart illustrating steps for determining a preferred action according to an embodiment of the present application. Specifically, the clustered average field random game model outputs the preferred actions of each red-square drone by:

S201, determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster.

The blue-side unmanned aerial vehicle cluster state at least comprises a formation, a grouping and a combat mode, and the step of determining the action space of the red-side unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the blue-side unmanned aerial vehicle cluster state specifically comprises the following steps:

according to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from a game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute actions according to time sequence arrangement. And determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.

The game countermeasure library may be pre-established, and the game countermeasure mechanism library includes a plurality of unmanned aerial vehicle cluster countermeasure schemes, where the unmanned aerial vehicle cluster countermeasure schemes are used to instruct each red unmanned aerial vehicle in the cluster to execute actions according to time sequence, for example, move to enemies according to a predetermined patrol formation, catch attack enemies according to 2V1, and so on.

The unmanned cluster countermeasure scheme herein may be expressed asOf (1), wherein->, wherein />For fingerShow->And (5) the red unmanned aerial vehicle. The action space here can be expressed as +.>At any decision moment, unmanned plane +.>Action to be taken->Not only to itself but also to the whole cluster.

S202, determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment.

The relative position information comprises a sight angle between two unmanned aerial vehicles, an entry angle between a speed vector of a target unmanned aerial vehicle and the sight, an included angle between speeds of the two unmanned aerial vehicles, a distance between the two unmanned aerial vehicles and a relative speed between the two unmanned aerial vehicles, and the Markov transition probability distribution of the red unmanned aerial vehicle is determined in the following mode:

；

wherein For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->The angle between them cooperates with the potential field,for this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->The distance between them cooperates with the potential field- >For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->A speed synergy potential field between->For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>An angular potential field between +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>Distance between the power fields, +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>A velocity power potential field between +.>Is in combination with red square unmanned aerial vehicle->Corresponding red unmanned aerial vehicle set, +.>Is in combination with red square unmanned aerial vehicle->A corresponding blue-square unmanned aerial vehicle set;

determining the Markov transition probability distribution of the red unmanned aerial vehicle through the following formula：

；

An average field cluster-based gaming strategy generation algorithm, which may be expert knowledge aided here, is based on reinforcement learning. Unmanned planeThe acquired perception information is mainly relative geometric information between two machines, and can be expressed asRespectively represent unmanned plane->Line of sight angle between speed vector and two machine lines of sight, unmanned aerial vehicle +.>Target entry angle between velocity vector and line of sight, distance between two machines, angle between speeds of two machines, and relative speed of two machines.

ConsiderClusters of individual unmanned aerial vehicles, record +.>Nearest to the individual unmanned plane->The personal unmanned plane is gathered as +.>Relative to which is nearest +.>The personal enemy unmanned aerial vehicle set is +.>. Introducing potential field concept, defining My +.>Unmanned aerial vehicle receives in my cluster +.>The radiated potential field is a synergistic potential field +.>，Respectively, angular uniformity, distance uniformity and speed uniformity. At the same time define My->Unmanned aerial vehicle receives in enemy cluster +.>The radiated potential field is the power field +.>，Respectively representing enemy angle power, distance power and speed power.

The step establishes interaction among clustered individuals, and can calculate the total potential field energy and the quantitative situation of the clustered individuals according to the relative position relation with the neighborhood individuals, so as to adopt maneuvering strategies.

S203, taking the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, taking the execution action of the red unmanned aerial vehicle as the dependent variables, and solving the execution action meeting the Nash equilibrium condition in the action space as the preferable action of the red unmanned aerial vehicle.

The method specifically comprises the steps of taking potential field probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, taking the execution action of the red unmanned aerial vehicle as the dependent variables, and solving the execution action meeting Nash equilibrium conditions in an action space as the preferable action of the red unmanned aerial vehicle, wherein the steps comprise the following steps:

wherein ,for this red square unmanned aerial vehicle +.>Is>For this red square unmanned aerial vehicle +.>Is the preferred action of (a), discount rateIs an action space.

Defined hereinFor the overall impact on the individual, the Markov transition probability for the individual state is defined as follows:

，

wherein ,is the current unmanned plane state. For->And discount rate->Unmanned aerial vehicle->The objective of (a) is to maximize the desired cumulative discount rewards function, i.e

。

The process establishes an average field random game model of the clusters, rewards obtained by the clusters are influenced by state probability distribution of other unmanned aerial vehicles, and the clusters are mutually influenced through state and income functions.

For containingCluster system of unmanned aerial vehicles, given +.>Status of personal unmanned plane->Probability distribution resulting from sum potential fieldDetermining an optimal strategy->To satisfy the Nash equilibrium condition:

，

the formula defining the value function is:

；

the bellman formula can be obtained according to the dynamic programming principle as follows:

。

when the method is used for solving the problem, the state transition probability needs to be obtained, meanwhile, the design of rewarding and punishing functions is needed in the training process, and quantized data is obtained through conversion of expert knowledge. Solving the problems can obtain the optimal sequential decision of the single unmanned aerial vehicle, and the whole cluster can emerge the cluster behavior such as autonomous formation, grouping number, killing mode and the like by considering the whole cluster.

According to the unmanned aerial vehicle game countermeasure strategy generation method provided by the embodiment of the application, the behavior and the intention of the blue unmanned aerial vehicle are analyzed through deep learning before strategy generation, so that the game countermeasure strategy of the red unmanned aerial vehicle is dynamically generated and adjusted, and the timeliness of the accuracy of the decision is improved.

In one embodiment of the application, the trajectory prediction model and the attack target prediction model are generated by training the following processes:

aiming at the dynamics of the enemy clusters, the space situation prediction modeling inversion and the on-line optimization based on the neural network are developed, and the dynamic prediction of the enemy cluster behavior characteristics is realized.

Aiming at the problems of rapid change of battlefield situation, difficult prediction in game countermeasure process, multiple constraint of game model and the like, in the embodiment, a neural network model is designed based on a long-short-time memory neural network to predict the attack path of an enemy cluster, and then a classifier for predicting the attack target is designed based on an LSTM and an iterative hole convolution neural network. The target track prediction based on deep learning does not need to model the target when in use, and overcomes the defect that the acceleration change is difficult to predict due to unknown target aerodynamic parameters in the traditional algorithm.

Specifically, firstly, extracting time sequence characteristics of a blue unmanned aerial vehicle track by an LSTM layer, and then searching local characteristics by an IDCNN layer, thereby completing classification of targets. For track prediction, the LSTM has the characteristics of high calculation speed and good real-time performance, and the LSTM is used for design independently. And collecting flight tracks of the enemy aircraft within a period of time, and constructing a track database for training and generating a track prediction model.

In LSTM, first, the reserved data of the last moment is determined, and this part is composed of a forgetting gate, where the forgetting gate function expression is:

，

wherein ,respectively representing an output value at a moment and an input value at a current moment on the LSTM network, < >>Representing an activation function->Representing forgetting door weight,/->Indicating forgetting door bias->The vector is output for the forget gate.

The forgetting gate decides the transmission condition of the state at the last moment by converting the state between the input and the last moment into a value between 0 and 1 through an activation function, and then the input gate decides the update information of the unit state, and the input gate function expression is as follows:

，

wherein ,respectively representing the weight and bias of the input gate, +.>Weight and bias respectively representing extraction of effective information, +.>Current value to be updated representing the state of the cell, +. >For inputting gate weight for controlling +.>Which features are eventually updated to the cell state +.>In, cell state->The functional expression is:

，

finally, outputting a final result by an output gate, wherein the output gate function expression is as follows:

，

wherein ,the second formula calculates the output weight of the output gate, namely the state of the unit at the current moment, the weight and the bias of the output gate>For LSTM final output, it is determined by the output gate weight and cell state. Through the three gating units and unit state transfer, the LSTM can process the time sequence problem, and can be further used for scenes such as time sequence track prediction and the like.

For the attack target prediction model, firstly, a trained track prediction model is input according to the current available partial track data and expert data so as to obtain a plurality of track prediction results, the track prediction results are used as the input of a classifier, and the attack target is predicted through the classifier.

In one embodiment of the application, aiming at the characteristics of intent antagonism, high dynamic property, deception and the like of the adversary clusters, based on the acquired situation information and expert knowledge base, a fuzzy neural network and an inverse reinforcement learning framework are adopted to develop interpretation and modeling analysis of the adversary intent under the cluster game antagonism condition. Constructing a tactical intention model based on a fuzzy neural network, forming an countermeasure sample by using the target attribute of the enemy and the corresponding tactical intention to train, and combining the neural network formed by different source data to realize intelligent reasoning of the enemy fight intention, and deducing the enemy cluster state according to the basic action of the intelligent reasoning, thereby improving the accuracy and the rapidity of interpretation; based on the inverse reinforcement learning framework, the intention balance solution analysis is given, the confidence of the prediction trend is further evaluated, and the interpretability is further improved. Planning generation supporting my gaming strategy. The intent interpretation model is here trained to be generated by:

And acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a track sequence of a plurality of sample blue unmanned aerial vehicles and a corresponding blue unmanned aerial vehicle cluster state.

The fuzzy neural network system can be built based on the fuzzy system and fused with the neural network, and is converted into the self-adaptive network to realize the learning process of the T-S fuzzy type.

And constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer.

Firstly, constructing an input layer, and recording an input vector corresponding to each node input network of the input layer as。

Then, a fuzzy layer is constructed, wherein the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical number of the combat mode, each fuzzy node corresponds to a different membership function, and the membership function has the formula:

，

wherein ，/>For input node in input layer->Corresponding track sequence,/->For input node->Connected fuzzy node +.>Corresponding membership function, +.>For the first target parameter, +.>As a second target parameter, the first target parameter,

each input node corresponds toFuzzy nodes, called->Is used +.>And (3) representing. Each fuzzy node corresponds to a respective membership function +. >Which can be expressed as +.>。

A fuzzy inference layer is then built, each inference node representing a rule (i.e., intent of enemy party such as interference, impact, killing, etc.) for computing the fitness of the rule.

Specifically, the fuzzy inference layer includes a plurality of inference nodes, and the calculation rule formula of each inference node is:

。

and finally, building a definition layer, and clearing and outputting the data of the fuzzy reasoning layer.

，

Specifically, in the training stage, the number of nodes at each layer needs to be predetermined first, the fuzzy inference layer is given through a specific operation rule, and the parameters to be learned are mainly. Then, given the target attribute obtained after the sensor data fusion and the corresponding tactical intent countermeasure sample, the comprehensive calculation target behavior intent is obtained from different data sources and is converted into a training data set. After finishing data preparation, entering a parameter training stage, carrying out training learning of parameters through back propagation or a mixed algorithm of back propagation and a least square method, and adjusting parameters of a system; in the hybrid algorithm, the least square estimation is used for identifying weight parameters when the forward stage calculates to the definition layer, the error signal in the reverse stage is reversely transmitted, and the membership function parameters are updated by using the reverse propagation. The adoption of the mixing method can reduce the search space scale of the back propagation method, thereby improving the training speed of the fuzzy neural network.

In one embodiment of the application, the method further comprises the step of storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle of the red and blue parties in a game countermeasure mechanism library for optimizing and updating an intention interpretation model and a cluster average field random game model.

And constructing simulation scenes according to different cluster combat task requirements, and constructing an expert system strategy generator. On the premise that the enemy clusters randomly select game strategies, the enemy clusters collect battlefield situation information, an expert system is utilized to generate game countermeasure strategies, a game countermeasure mechanism library is constructed, strategy selection and a battlefield evolution process are stored, and data support is provided for subsequent game algorithms and strategy model design.

In one embodiment of the application, a simulation instance is provided that applies unmanned game countermeasure policy generation to a typical scenario.

The specific scene settings are as follows: enemy clusters: 10 unmanned aerial vehicles hit the ground target area of 2 places of the my according to a fixed strategy at a certain height outside 1500 m; my cluster: 20 unmanned aerial vehicles are in a spiral standby flight state at a preset position (500 m from a target area and a certain height). The interval between the ground target areas at the position 2 of the my department is 100m, and the area of each ground target area is 10 。

The parameters are set as follows: my/enemy drone field of view distance: 150m,100m; maximum speed of my/enemy drone: 5,4/>The other unmanned aerial vehicle dynamic parameters are set according to typical parameters.

In the initialization stage, firstly, a game countermeasure scene is established, and then, a cluster dynamics nonlinear mathematical model is established for participating in a game countermeasure simulation process. And executing an initialization strategy by the clusters of the two parties after loading the dynamic model, updating the detection state in real time, and entering a game countermeasure link once the cluster individuals detect the other party cluster individuals.

In the game countermeasure link, an enemy cluster strategy library is fixed and is allocated in an initialization stage group, and after an individual of the enemy cluster is detected, the enemy cluster wakes up tasks in the group and executes the tasks. Unlike the enemy cluster policy, the enemy cluster policy is variable: firstly, a cluster behavior and track prediction module is utilized to realize behavior prediction according to enemy cluster situation; on the basis of obtaining a prediction result, intent interpretation is carried out on the enemy clusters, and according to the situation of the enemy clusters, expert knowledge is utilized to assist intelligent learning game strategies, and dominant strategies are selected from a strategy library to be game-played with the enemy clusters.

Fig. 4 is a schematic diagram of a first unmanned aerial vehicle game countermeasure simulation according to an embodiment of the present application. Fig. 5 is a schematic diagram of a game challenge simulation of a second unmanned aerial vehicle according to an embodiment of the present application. Fig. 6 is a schematic diagram of the number change of unmanned aerial vehicles according to an embodiment of the present application. The gaming stage is shown in fig. 4 and 5. Triangles represent my clusters and circles represent enemy clusters. In the beginning, the my clusters move towards the enemy according to the preset patrol formation, and the enemy clusters form a certain formation group to hit the ground target (see fig. 4). In the game stage, the enemy clusters first find enemy clusters in the view field, transfer enemy information in the clusters, then realize enemy cluster track prediction according to the LSTM network, and crack the opponent intention based on the track prediction result, and deduce that the enemy cluster fight intention is to attack the enemy ground target according to a fixed route in the scene simulation (see figure 5). Based on the pre-trained expert knowledge auxiliary game technology, the cost of the 2V1 capturing attack mode is minimum according to the derivation of the mechanism library by the My cluster, and an attack formation is formed autonomously. During the striking process, the game strategy of the my is dynamically changed, if the target is eliminated, the unmanned aerial vehicle automatically matches the next attack target according to the war situation (in fig. 5, the dark triangle represents the unmanned aerial vehicle of the my enters the interception state, the light triangle represents the unmanned aerial vehicle of the my is still in the standby state, the large black circle represents the unmanned aerial vehicle of the enemy has been locked for attack, otherwise, the small light circle represents the unmanned aerial vehicle of the enemy).

To further illustrate the effectiveness of the game strategy, as shown in fig. 6, the two parties of the friend and foe start to strike at about 120s, the friend and foe clusters are all destroyed at about 180s, 10 frames remain at the moment, and the best striking and the minimum cost are realized on the premise of ensuring the safety of ground targets.

Aiming at the problems of low decision accuracy and the like caused by the hostile behavior deception in the process of the game countering of the unmanned aerial vehicle cluster, the method for generating the unmanned aerial vehicle game countermeasures provided by the embodiment of the application provides a technical scheme for predicting hostile behavior and interpreting intention before strategy generation. Based on methods such as a neural network, dynamic prediction of unmanned aerial vehicle cluster behavior characteristics is realized, then a tactical intention reasoning model is constructed based on a fuzzy neural network and an inverse reinforcement learning method, intelligent reasoning of the hostile intention is realized by combining different source data training neural networks, balanced solution analysis and confidence level of the hostile intention are provided, and accuracy of decision making is improved. Aiming at the high dynamic state of battlefield environment and enemy strategies, most of the current game countermeasure strategy generation algorithms are not strong in rapidness and pertinence, a game countermeasure strategy set is generated by adopting expert knowledge assistance, a game countermeasure mechanism library is constructed by combining typical task links, experience storage is realized, an expert knowledge assistance intelligent learning game strategy algorithm is constructed, and the my strategies are dynamically adjusted to realize the game strategies of the my clusters, so that the high instantaneity and strong pertinence of a decision process are realized.

Based on the same inventive concept, the embodiment of the application also provides an unmanned plane game countermeasure policy generation system corresponding to the unmanned plane game countermeasure policy generation method, and because the principle of solving the problem of the device in the embodiment of the application is similar to that of the unmanned plane game countermeasure policy generation method in the embodiment of the application, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an unmanned plane game countermeasure policy generation system according to an embodiment of the present application. As shown in fig. 7, the unmanned plane game countermeasure policy generation system includes:

the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and comprises:

the track prediction unit 101 is configured to obtain a historical track sequence of at least one target blue unmanned aerial vehicle collected by the red unmanned aerial vehicle, and input a track prediction model trained in advance, so as to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;

the attack target prediction unit 102 is configured to input track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model, so as to output attack target prediction results of each target blue unmanned aerial vehicle;

The intention interpretation unit 103 is configured to input the track prediction results of all the target blue unmanned aerial vehicles and the attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model, so as to output a blue unmanned aerial vehicle cluster state;

the game countermeasure unit 104 is configured to input the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment, and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model, so as to output a preferred motion of the red unmanned aerial vehicle, and control the red unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 8, the electronic device 800 includes a processor 810, a memory 820, and a bus 830.

The memory 820 stores machine-readable instructions executable by the processor 810, when the electronic device 800 is running, the processor 810 communicates with the memory 820 through the bus 830, and when the machine-readable instructions are executed by the processor 810, the steps of the method for generating the game countermeasure policy by the unmanned aerial vehicle in the method embodiment shown in fig. 1 can be executed, and the specific implementation is referred to the method embodiment and will not be described herein.

The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the method for generating an unmanned plane game countermeasure policy in the method embodiment shown in fig. 1 can be executed, and a specific implementation manner may refer to the method embodiment and will not be described herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RandomAccessMemory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method for generating a UAV game confrontation strategy, characterized in that the method includes:

For each red-side drone, determine the execution action of the red-side drone at the next moment through the following methods, and control the red-side drone to move according to the determined execution action:

Obtain the historical trajectory sequence of at least one target blue drone collected by the red drone, and input the pre-trained trajectory prediction model to output each target corresponding to the red drone The trajectory prediction results of the blue team’s UAV;

Input the trajectory prediction results of all target blue drones corresponding to all red drones at the current moment into the pre-trained attack target prediction model to output the attack target prediction results of each target blue drone;

Input the trajectory prediction results of all target blue team drones and the attack target prediction results of all target blue team drones into the pre-trained intention interpretation model to output the blue team drone cluster status, where, in the following way Training to generate the intention interpretation model: Obtain a training data set, the training data set includes multiple sets of data samples, each data sample includes multiple sample trajectory sequences of the blue side drones and the corresponding blue side drone clusters State; Construct a target fuzzy neural network model, the target fuzzy neural network model includes an input layer, a fuzzy layer, a fuzzy inference layer and an output layer, the fuzzy layer includes a preset number of fuzzy nodes determined according to the statistical number of combat modes , each fuzzy node corresponds to a different membership function, and the formula of the membership function is:

,

in, ,/> is the input node in the input layer/> The corresponding trajectory sequence,/> is the input node/> Connected fuzzy nodes/> The corresponding membership function,/> is the first target parameter,/> is the second target parameter, the fuzzy inference layer includes multiple inference nodes, and the calculation rule formula of each inference node is:

,

The output layer includes multiple output nodes, and the clarifying function formula of each output node is:

,

in, is the third target parameter; input the training data set into the constructed target fuzzy neural network model, and adjust the first target in the target fuzzy neural network model based on a hybrid algorithm combining back propagation and least squares. parameters, second target parameters and third target parameters to obtain the pre-trained intention interpretation model;

The blue team drone cluster status, the relative position information between the red team drone and the target blue team drone at the current moment, the relationship between the red team drone and other red team drones at the current moment The relative position information and the motion state of the red drone at the current moment are input into the pre-trained cluster mean field stochastic game model to output the preferred action of the red drone and control the red drone to follow the desired action. Identified preferred movement movements;

Among them, the target blue drone is a blue drone within the monitoring range of the red drone, and the other red drones are red drones within the monitoring range of the red drone. machine.

2. The method according to claim 1, characterized in that the cluster mean field stochastic game model outputs the preferred action of each red drone in the following manner:

According to the status of the blue team drone cluster, determine the action space of the red team drone cluster from the game confrontation mechanism library;

According to the relative position information between the red square drone and the target blue square drone at the current moment, and the relative position information between the red square drone and other red square drones at the current moment, the red square drone is determined. Markov transition probability distribution for drones;

Using the Markov transition probability distribution of the red side UAV and the motion state of the red side UAV at the current moment as independent variables, and the execution action of the red side UAV as the dependent variable, the red side unmanned aerial vehicle is solved. The execution action that satisfies the Nash equilibrium condition in the action space of the human-machine cluster is regarded as the preferred action of the red drone.

3. The method according to claim 2, characterized in that the blue team UAV cluster state at least includes formation, grouping and combat mode, and the blue team UAV cluster state is based on the game confrontation. The steps in the mechanism library to determine the action space of the red drone cluster include:

According to the status of the blue team UAV cluster and the number of red team UAVs, the corresponding UAV cluster confrontation plan is matched from the game confrontation mechanism library, which includes multiple UAVs. The drone cluster countermeasures plan, each drone cluster countermeasure is used to instruct each red team drone to perform actions in a time sequence;

According to the matched UAV cluster confrontation plan, the execution actions corresponding to each red side UAV in time sequence are determined to generate the action space of the red side UAV cluster.

4. The method according to claim 2, characterized in that the relative position information includes the line of sight angle between the two UAVs, the entry angle between the speed vector of the target UAV and the line of sight, the angle of sight between the two drones. The angle between the speeds, the distance between the two aircraft, and the relative speed between the two aircraft are used to determine the Markov transition probability distribution of the red drone in the following way:

Determine the total potential field energy of the red drone through the following formula :

;

in This is the red drone/> With other red team drones/> The angle collaborative potential field between them,/> This is the red drone/> With other red team drones/> The distance between the collaborative potential fields,/> This is the red drone/> With other red team drones/> The velocity collaborative potential field between them,/> This is the red drone/> with target blue team drone/> The angle between the power potential field,/> This is the red drone/> with target blue team drone/> The distance between the power potential field, This is the red drone/> with target blue team drone/> The speed power potential field between,/> For the red drone/> The corresponding red team drones gather,/> For the red drone/> The corresponding blue team drone collection;

The Markov transition probability distribution of the red drone is determined by the following formula :

;

in, This is the red drone/> Current time/> The state of motion,/> This is the red drone/> Current time/> execution action.

5. The method according to claim 4, characterized in that, using the Markov transition probability distribution of the red side UAV and the motion state of the red side UAV at the current moment as independent variables, the red side UAV The execution action of the drone is the dependent variable, and the steps to find the execution action that satisfies the Nash equilibrium condition in the action space as the preferred action of the red drone include:

The execution action that satisfies the Nash equilibrium condition is determined as the preferred action of the red drone through the following formula:

in, This is the red drone/> value function,/> This is the red drone/> preferred action, discount rate ,/> for action space.

6. The method of claim 2, further comprising:

The obtained cluster status and corresponding scene information corresponding to the red and blue drones are stored in the game confrontation mechanism library for optimizing the intention interpretation model and the cluster mean field stochastic game model. and updates.

7. A UAV game confrontation strategy generation system, characterized in that the system includes:

The control module is used for each red square drone to determine the execution action of the red square drone at the next moment, and to control the red square drone to move according to the determined execution action. The control module includes :

The trajectory prediction unit is used to obtain the historical trajectory sequence of at least one target blue drone collected by the red drone, and input the pre-trained trajectory prediction model to output the corresponding trajectory of the red drone. The trajectory prediction results of each target blue drone;

The attack target prediction unit is used to input the trajectory prediction results of all target blue side unmanned aerial vehicles corresponding to all red side unmanned aerial vehicles at the current moment into the pre-trained attack target prediction model to output each of the target blue side unmanned aerial vehicles. The attack target prediction results of the aircraft;

The intention interpretation unit is used to input the trajectory prediction results of all target blue team drones and the attack target prediction results of all target blue team drones into the pre-trained intention interpretation model to output the blue team drone cluster. state, wherein the intention interpretation model is trained and generated in the following manner: obtaining a training data set, the training data set includes multiple sets of data samples, each data sample includes multiple sample trajectory sequences of the blue side UAV and the corresponding The blue team UAV cluster status; construct a target fuzzy neural network model, the target fuzzy neural network model includes an input layer, a fuzzy layer, a fuzzy inference layer and an output layer, the fuzzy layer includes a number determined according to the statistical number of combat modes There are a preset number of fuzzy nodes, and each fuzzy node corresponds to a different membership function. The formula of the membership function is:

,

The game confrontation unit is used to combine the status of the blue drone cluster, the relative position information between the red drone and the target blue drone at the current moment, the red drone and other red drones at the current moment. The relative position information between square UAVs and the motion status of the red square UAV at the current moment are input into the pre-trained cluster mean field stochastic game model to output the preferred action of the red square UAV and control the red square UAV. The square drone moves according to the determined optimal action;

8. An electronic device, characterized in that it includes: a processor, a memory and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the Memories communicate with each other through a bus, and the processor executes the machine-readable instructions to execute the steps of the UAV game confrontation strategy generation method according to any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the UAV according to any one of claims 1 to 6 is executed. Steps of the game confrontation strategy generation method.