CN116360503A

CN116360503A - Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Info

Publication number: CN116360503A
Application number: CN202310628021.7A
Authority: CN
Inventors: 刘昊; 吕金虎; 王新迪; 高庆; 刘德元; 钟森
Original assignee: Beihang University; Academy of Mathematics and Systems Science of CAS
Current assignee: Beihang University; Academy of Mathematics and Systems Science of CAS
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-06-30
Anticipated expiration: 2043-05-31
Also published as: CN116360503B

Abstract

The application provides a method, a system and electronic equipment for generating a game countermeasure strategy of an unmanned aerial vehicle, and relates to the technical field of aircraft control, wherein the method comprises the steps of inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; the method comprises the steps of inputting a cluster state of a blue-side unmanned aerial vehicle, relative position information between the red-side unmanned aerial vehicle and a target blue-side unmanned aerial vehicle at the current moment, relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and a motion state of the red-side unmanned aerial vehicle at the current moment into a cluster average field random game model trained in advance so as to output a preferred motion of the red-side unmanned aerial vehicle, and controlling the red-side unmanned aerial vehicle to move according to the determined preferred motion, so that the accuracy of unmanned aerial vehicle game defense strategy generation is improved.

Description

Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Technical Field

The application relates to the technical field of aircraft control, in particular to a method, a system and electronic equipment for generating a game countermeasure strategy of an unmanned aerial vehicle.

Background

The unmanned aerial vehicle game defense strategy autonomous generation technology refers to a technology for autonomously generating a game strategy based on battlefield situation and perceived information of both parties of a enemy in an operational environment by an unmanned aerial vehicle cluster so as to realize the aims of resisting the operational intention of the enemy, protecting the ground targets of the enemy and achieving the operational purpose of the enemy. In the prior art, the existing policy generation method has lower decision accuracy when the enemy unmanned aerial vehicle cluster has deception and false action scenes, so that a policy generation algorithm with higher decision accuracy is needed.

Disclosure of Invention

In view of this, the present application aims to provide a method, a system and an electronic device for generating an unmanned aerial vehicle game countermeasure policy, so as to improve the accuracy of generating the unmanned aerial vehicle game defense policy.

In a first aspect, the present application provides a method for generating a game countermeasure policy for an unmanned aerial vehicle, where the method includes: for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes: acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle; inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.

Preferably, the clustered average field random game model outputs the preferred actions for each red-square drone by: determining the action space of the red unmanned aerial vehicle cluster from a game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster; determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment; the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment are taken as independent variables, the execution action of the red unmanned aerial vehicle is taken as a dependent variable, and the execution action meeting Nash equilibrium conditions in the action space of the red unmanned aerial vehicle cluster is solved to be taken as the preferable action of the red unmanned aerial vehicle.

Preferably, the state of the blue unmanned aerial vehicle cluster at least comprises a formation, a grouping and a combat mode, and the step of determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster specifically comprises the following steps: according to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from a game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute the action according to time sequence arrangement; and determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.

Preferably, the relative position information includes a line of sight angle between two unmanned aerial vehicles, an entry angle between a speed vector of a target unmanned aerial vehicle and the line of sight, an included angle between speeds of the two unmanned aerial vehicles, a distance between the two unmanned aerial vehicles, and a relative speed between the two unmanned aerial vehicles, and the markov transition probability distribution of the red unmanned aerial vehicle is determined by:

the total potential field energy of the red unmanned aerial vehicle is determined by the following formula

：

；

；

；

wherein

For this red square unmanned aerial vehicle +.>

And other red unmanned aerial vehicle->

The angle between them cooperates with the potential field,

for this red square unmanned aerial vehicle +.>

And other red unmanned aerial vehicle->

The distance between them cooperates with the potential field->

For this red square unmanned aerial vehicle +.>

And other red unmanned aerial vehicle->

A speed synergy potential field between->

For this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

An angular potential field between +.>

For this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

Distance between the power fields, +.>

For this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

A velocity potential field therebetween;

determining the Markov transition probability distribution of the red unmanned aerial vehicle through the following formula

：

；

wherein ,

for this red square unmanned aerial vehicle +.>

Current time->

Is in the state of motion->

For this red square unmanned aerial vehicle +. >

Current time->

Is performed by the processor.

Preferably, the step of solving the execution action meeting the nash equilibrium condition in the action space as the preferred action of the red unmanned aerial vehicle by taking the potential field probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables and the execution action of the red unmanned aerial vehicle as the dependent variables specifically comprises:

determining the execution action meeting the Nash equilibrium condition as the preferable action of the red unmanned aerial vehicle through the following formula:

wherein ,

for this red square unmanned aerial vehicle +.>

Is>

For this red square unmanned aerial vehicle +.>

Is the preferred action of (a), discount rate

。

Preferably, the method further comprises the step of storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle in a game countermeasure mechanism library for optimizing and updating an intention interpretation model and a cluster average field random game model.

Preferably, the intent interpretation model is generated by training in the following way: acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a plurality of sample blue unmanned aerial vehicle track sequences and corresponding blue unmanned aerial vehicle cluster states; the method comprises the steps of constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer, the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical quantity of combat modes, each fuzzy node corresponds to a different membership function, and the membership function has a formula as follows:

，

wherein

，/>

For input node in input layer->

Corresponding track sequence,/->

For input node->

Connected fuzzy node +.>

Corresponding membership function, +.>

For the first target parameter, +.>

For the second target parameter, the fuzzy inference layer comprises a plurality of inference nodes, and the calculation rule formula of each inference node is as follows:

，

the output layer comprises a plurality of output nodes, and the definition function formula of each output node is as follows:

，

wherein ,

is a third target parameter; and inputting the training data set into the constructed target model neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target model neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire a pre-trained intention interpretation model.

In a second aspect, the present application provides an unmanned aerial vehicle game countermeasure policy generation system, the system comprising: the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and comprises: the track prediction unit is used for acquiring a history track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, inputting a pre-trained track prediction model, and outputting a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; the attack target prediction unit is used for inputting track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue unmanned aerial vehicle; the intention interpretation unit is used for inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; the game countermeasure unit is used for inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.

In a third aspect, the present application further provides an electronic device, including: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device is running, and the machine-readable instructions are executed by the processor to perform the steps of generating the game countermeasure strategy of the unmanned plane.

In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a drone game countermeasure policy generation method as described above.

The method comprises the steps of determining the execution action of each red unmanned aerial vehicle at the next moment according to each red unmanned aerial vehicle, and controlling the red unmanned aerial vehicle to move according to the determined execution action: acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle; inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range, behaviors and intentions of the blue unmanned aerial vehicle are analyzed through deep learning before strategy generation, game countermeasure strategies of the red unmanned aerial vehicle are dynamically generated and adjusted, and timeliness of accuracy of decisions is improved.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for generating a game countermeasure policy for an unmanned aerial vehicle according to an embodiment of the present application;

FIG. 2 is a flowchart of steps for determining a preferred action provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a scenario for generating a game countermeasure policy according to an embodiment of the present application;

fig. 4 is a schematic diagram of a first drone game countermeasure simulation provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a game challenge simulation of a second unmanned aerial vehicle according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a change in the number of unmanned aerial vehicles according to an embodiment of the present application;

Fig. 7 is a block diagram of an unmanned plane game countermeasure policy generation system according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment that a person skilled in the art would obtain without making any inventive effort is within the scope of protection of the present application.

First, application scenarios applicable to the present application will be described. The method and the device can be applied to game countermeasure strategy generation of unmanned aerial vehicle cooperative combat, and are particularly suitable for control strategy generation of the rotor unmanned aerial vehicle.

The research shows that the unmanned aerial vehicle game defense strategy autonomous generation technology refers to a technology for automatically generating a game strategy based on battlefield situation and perceived information of both sides of a friend or foe in an operational environment by an unmanned aerial vehicle cluster so as to realize the aims of resisting the intention of the foe to combat, protecting the ground targets of the friend or foe and achieving the aim of the foe to combat. In the prior art, lei et al, for example, put forward an optimal strategy based on complete information and Markov, so as to realize attack and defense for a moving target; carter et al considers the dynamic conversion of game models under different attack scenarios, thereby proposing a strategy generation algorithm; the Garcia et al models the unmanned aerial vehicle cluster game problem as a differential game problem of cluster hitting, and gives out the unmanned aerial vehicle cluster anti-hitting guidance law under the scene by establishing a whole process performance function and giving out a hitting capability evaluation function.

Based on the above, the embodiment of the application provides a method, a system and electronic equipment for generating an unmanned aerial vehicle game countermeasure strategy, so as to improve the accuracy of unmanned aerial vehicle game defending strategy generation.

Referring to fig. 1 and fig. 3, fig. 1 is a flowchart of a method for generating a game countermeasure policy for an unmanned aerial vehicle according to an embodiment of the present application, and fig. 3 is a schematic diagram of a scenario for generating a game countermeasure policy according to an embodiment of the present application. As shown in fig. 1, the method for generating the game countermeasure policy of the unmanned aerial vehicle provided in the embodiment of the present application includes:

for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes:

here, in one embodiment of the present application, the execution actions of the red unmanned aerial vehicle are dynamically transformed, that is, the execution actions of each red unmanned aerial vehicle at the next moment are determined based on the historical data collected by the red unmanned aerial vehicle. The historical data here includes, but is not limited to, the status and scene information of the blue unmanned aerial vehicle collected by the red unmanned aerial vehicle over a period of time. Actions herein include, but are not limited to, speed, pose, whether to attack, etc. of the drone.

S101, acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle.

In step S101, the trajectory prediction model is trained and generated based on an LSTM (Long-short time memory nerve networks) model. The track prediction model is input into a track sequence of each target blue unmanned aerial vehicle, the target blue unmanned aerial vehicle refers to any blue unmanned aerial vehicle monitored in the sensing range of the red unmanned aerial vehicle, and the history track sequence can be continuous or discrete, such as continuous three-dimensional position coordinates of the target blue unmanned aerial vehicle in the past 10 seconds, or three-dimensional position coordinates corresponding to the 8 th second, the 5 th second and the 3 rd second of the target blue unmanned aerial vehicle. The output of the track prediction model is a predicted track sequence corresponding to each input target blue unmanned aerial vehicle, for example, three-dimensional position coordinates within 5 seconds of the target blue unmanned aerial vehicle or three-dimensional position coordinates within 3 seconds of the target blue unmanned aerial vehicle, and the shorter the length of the predicted track sequence, the more beneficial to the prediction in the step S102.

S102, inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle.

The attack target prediction model is trained and generated by adopting an IDCNN (ItatedDilatedConvolvulation NeuralNetworks) model. The input of the attack target prediction model is a track prediction result of each target blue unmanned aerial vehicle output by the track prediction model. The output of the attack target prediction model is the attack target prediction result of each target blue unmanned aerial vehicle, for example, the attack target of the target blue unmanned aerial vehicle A is the red unmanned aerial vehicle B, the attack target of the target blue unmanned aerial vehicle C is the red ground target D, and the like. The attack target at least comprises a red unmanned aerial vehicle, a red ground target or an unmanned aerial vehicle.

S103, inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state.

The intent interpretation model is generated by training with a fuzzy neural network model. The input of the intention interpretation model is a behavior sequence of each target blue unmanned aerial vehicle determined by the output results of the track prediction model and the attack target prediction model. The output of the intent interpretation model is the cluster state of the blue unmanned aerial vehicle, wherein the cluster state at least comprises information such as formation, grouping, combat mode and the like, for example, formation of a first formation grouping to hit the ground target on my side, formation of a second formation cluster to detect the information on my side, formation of a third formation cluster to interfere with the unmanned aerial vehicle B on my side and the like.

S104, inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a cluster average field random game model trained in advance so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion.

The target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.

The cluster mean field random game model is generated by training based on reinforcement learning by an expert knowledge-aided mean field cluster game strategy generation algorithm. Inputs of the cluster average field random game model comprise enemy situation (relative geometric information between two machines), enemy unmanned aerial vehicle cluster state and the like. The output of the cluster average field random game model is the preferred action to be executed by the red unmanned aerial vehicle at the next moment.

As shown in fig. 2, fig. 2 is a flowchart illustrating steps for determining a preferred action according to an embodiment of the present application. Specifically, the clustered average field random game model outputs the preferred actions of each red-square drone by:

S201, determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster.

The blue-side unmanned aerial vehicle cluster state at least comprises a formation, a grouping and a combat mode, and the step of determining the action space of the red-side unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the blue-side unmanned aerial vehicle cluster state specifically comprises the following steps:

according to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from a game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute actions according to time sequence arrangement. And determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.

The game countermeasure library may be pre-established, and the game countermeasure mechanism library includes a plurality of unmanned aerial vehicle cluster countermeasure schemes, where the unmanned aerial vehicle cluster countermeasure schemes are used to instruct each red unmanned aerial vehicle in the cluster to execute actions according to time sequence, for example, move to enemies according to a predetermined patrol formation, catch attack enemies according to 2V1, and so on.

The unmanned cluster countermeasure scheme herein may be expressed as

Wherein

, wherein />

For indicating +.>

And (5) the red unmanned aerial vehicle.The action space here can be expressed as +.>

At any decision moment, unmanned plane +.>

Action to be taken->

Not only to itself but also to the whole cluster.

S202, determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment.

The relative position information comprises a sight angle between two unmanned aerial vehicles, an entry angle between a speed vector of a target unmanned aerial vehicle and the sight, an included angle between speeds of the two unmanned aerial vehicles, a distance between the two unmanned aerial vehicles and a relative speed between the two unmanned aerial vehicles, and the Markov transition probability distribution of the red unmanned aerial vehicle is determined in the following mode:

：

；

；

；

wherein

For this red square unmanned aerial vehicle +.>

And other red unmanned aerial vehicle->

The angle between them cooperates with the potential field,

for this red square unmanned aerial vehicle +.>

And other red unmanned aerial vehicle->

The distance between them cooperates with the potential field- >

For this red square unmanned aerial vehicle +.>

And other red unmanned aerial vehicle->

A speed synergy potential field between->

For this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

An angular potential field between +.>

For this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

Distance between the power fields, +.>

For this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

A velocity potential field therebetween;

：

；

wherein ,

for this red square unmanned aerial vehicle +.>

Current time->

Is in the state of motion->

For this red square unmanned aerial vehicle +.>

Current time->

Is performed by the processor.

An average field cluster-based gaming strategy generation algorithm, which may be expert knowledge aided here, is based on reinforcement learning. Unmanned plane

The acquired perception information is mainly relative geometric information between two machines, and can be expressed as

Respectively represent unmanned plane->

Line of sight angle between speed vector and two machine lines of sight, unmanned aerial vehicle +.>

Target entry angle between velocity vector and line of sight, distance between two machines, angle between speeds of two machines, and relative speed of two machines.

Consider

Clusters of individual unmanned aerial vehicles, record +.>

Nearest to the individual unmanned plane->

The personal unmanned plane is gathered as +. >

Relative to which is nearest +.>

The personal enemy unmanned aerial vehicle set is +.>

. Introducing potential field concept, defining My +.>

Unmanned aerial vehicle receives in my cluster +.>

The radiated potential field is a synergistic potential field +.>

，

Respectively, angular uniformity, distance uniformity and speed uniformity. At the same time, defineMy->

Unmanned aerial vehicle receives in enemy cluster +.>

The radiated potential field is the power field +.>

，

Respectively representing enemy angle power, distance power and speed power.

The step establishes interaction among clustered individuals, and can calculate the total potential field energy and the quantitative situation of the clustered individuals according to the relative position relation with the neighborhood individuals, so as to adopt maneuvering strategies.

S203, taking the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, taking the execution action of the red unmanned aerial vehicle as the dependent variables, and solving the execution action meeting the Nash equilibrium condition in the action space as the preferable action of the red unmanned aerial vehicle.

The method specifically comprises the steps of taking potential field probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, taking the execution action of the red unmanned aerial vehicle as the dependent variables, and solving the execution action meeting Nash equilibrium conditions in an action space as the preferable action of the red unmanned aerial vehicle, wherein the steps comprise the following steps:

wherein ,

for this red square unmanned aerial vehicle +.>

Is>

For this red square unmanned aerial vehicle +.>

Is the preferred action of (a), discount rate

。

Defined herein

For the overall impact on the individual, the Markov transition probability for the individual state is defined as follows:

，

wherein ,

is the current unmanned plane state. For->

And discount rate->

Unmanned aerial vehicle->

The objective of (a) is to maximize the desired cumulative discount rewards function, i.e

。

The process establishes an average field random game model of the clusters, rewards obtained by the clusters are influenced by state probability distribution of other unmanned aerial vehicles, and the clusters are mutually influenced through state and income functions.

For containing

Cluster system of unmanned aerial vehicles, given +.>

Status of personal unmanned plane->

Probability distribution resulting from sum potential field

Determining an optimal strategy->

To satisfy the Nash equilibrium condition:

，

the formula defining the value function is:

；

the bellman formula can be obtained according to the dynamic programming principle as follows:

。

when the method is used for solving the problem, the state transition probability needs to be obtained, meanwhile, the design of rewarding and punishing functions is needed in the training process, and quantized data is obtained through conversion of expert knowledge. Solving the problems can obtain the optimal sequential decision of the single unmanned aerial vehicle, and the whole cluster can emerge the cluster behavior such as autonomous formation, grouping number, killing mode and the like by considering the whole cluster.

According to the unmanned aerial vehicle game countermeasure strategy generation method, the behavior and the intention of the blue unmanned aerial vehicle are analyzed through deep learning before strategy generation, so that the game countermeasure strategy of the red unmanned aerial vehicle is dynamically generated and adjusted, and the timeliness of the accuracy of decisions is improved.

In one embodiment of the present application, the trajectory prediction model and the attack target prediction model are generated by training the following processes:

aiming at the dynamics of the enemy clusters, the space situation prediction modeling inversion and the on-line optimization based on the neural network are developed, and the dynamic prediction of the enemy cluster behavior characteristics is realized.

Aiming at the problems of rapid change of battlefield situation, difficult prediction in game countermeasure process, multiple constraint of game model and the like, in the embodiment, a neural network model is designed based on a long-short-time memory neural network to predict the attack path of an enemy cluster, and then a classifier for predicting the attack target is designed based on an LSTM and an iterative hole convolution neural network. The target track prediction based on deep learning does not need to model the target when in use, and overcomes the defect that the acceleration change is difficult to predict due to unknown target aerodynamic parameters in the traditional algorithm.

Specifically, firstly, extracting time sequence characteristics of a blue unmanned aerial vehicle track by an LSTM layer, and then searching local characteristics by an IDCNN layer, thereby completing classification of targets. For track prediction, the LSTM has the characteristics of high calculation speed and good real-time performance, and the LSTM is used for design independently. And collecting flight tracks of the enemy aircraft within a period of time, and constructing a track database for training and generating a track prediction model.

In LSTM, first, the reserved data of the last moment is determined, and this part is composed of a forgetting gate, where the forgetting gate function expression is:

，

wherein ,

respectively represent the output value and the time of moment on the LSTM networkInput value of previous time,/->

Representing an activation function->

Representing forgetting door weight,/->

Indicating forgetting door bias->

The vector is output for the forget gate.

The forgetting gate decides the transmission condition of the state at the last moment by converting the state between the input and the last moment into a value between 0 and 1 through an activation function, and then the input gate decides the update information of the unit state, and the input gate function expression is as follows:

，

wherein ,

respectively representing the weight and bias of the input gate, +.>

Weight and bias respectively representing extraction of effective information, +.>

Current value to be updated representing the state of the cell, +. >

For inputting gate weight for controlling +.>

Which features are eventually updated to the cell state +.>

In, cell state->

The functional expression is:

，

finally, outputting a final result by an output gate, wherein the output gate function expression is as follows:

，

wherein ,

the second formula calculates the output weight of the output gate, namely the state of the unit at the current moment, the weight and the bias of the output gate>

For LSTM final output, it is determined by the output gate weight and cell state. Through the three gating units and unit state transfer, the LSTM can process the time sequence problem, and can be further used for scenes such as time sequence track prediction and the like.

For the attack target prediction model, firstly, a trained track prediction model is input according to the current available partial track data and expert data so as to obtain a plurality of track prediction results, the track prediction results are used as the input of a classifier, and the attack target is predicted through the classifier.

In one embodiment of the application, aiming at the characteristics of intent resistance, high dynamic property, deception and the like of the adversary clusters, based on the acquired situation information and expert knowledge base, a fuzzy neural network and an inverse reinforcement learning framework are adopted to develop interpretation and modeling analysis of the adversary intent under the cluster game resistance condition. Constructing a tactical intention model based on a fuzzy neural network, forming an countermeasure sample by using the target attribute of the enemy and the corresponding tactical intention to train, and combining the neural network formed by different source data to realize intelligent reasoning of the enemy fight intention, and deducing the enemy cluster state according to the basic action of the intelligent reasoning, thereby improving the accuracy and the rapidity of interpretation; based on the inverse reinforcement learning framework, the intention balance solution analysis is given, the confidence of the prediction trend is further evaluated, and the interpretability is further improved. Planning generation supporting my gaming strategy. The intent interpretation model is here trained to be generated by:

And acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a track sequence of a plurality of sample blue unmanned aerial vehicles and a corresponding blue unmanned aerial vehicle cluster state.

The fuzzy neural network system can be built based on the fuzzy system and fused with the neural network, and is converted into the self-adaptive network to realize the learning process of the T-S fuzzy type.

And constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer.

Firstly, constructing an input layer, and recording an input vector corresponding to each node input network of the input layer as

。

Then, a fuzzy layer is constructed, wherein the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical number of the combat mode, each fuzzy node corresponds to a different membership function, and the membership function has the formula:

，

wherein

，/>

For input node in input layer->

Corresponding track sequence,/->

For input node->

Connected fuzzy node +.>

Corresponding membership function, +.>

For the first target parameter, +.>

As a second target parameter, the first target parameter,

each input node corresponds to

Fuzzy nodes, called->

Is used +.>

And (3) representing. Each fuzzy node corresponds to a respective membership function +. >

Which can be expressed as +.>

。

A fuzzy inference layer is then built, each inference node representing a rule (i.e., intent of enemy party such as interference, impact, killing, etc.) for computing the fitness of the rule.

Specifically, the fuzzy inference layer includes a plurality of inference nodes, and the calculation rule formula of each inference node is:

。

and finally, building a definition layer, and clearing and outputting the data of the fuzzy reasoning layer.

，

wherein ,

Specifically, in the training stage, the number of nodes at each layer needs to be predetermined first, the fuzzy inference layer is given through a specific operation rule, and the parameters to be learned are mainly. Then, given the target attribute obtained after the sensor data fusion and the corresponding tactical intent countermeasure sample, the comprehensive calculation target behavior intent is obtained from different data sources and is converted into a training data set. After finishing data preparation, entering a parameter training stage, carrying out training learning of parameters through back propagation or a mixed algorithm of back propagation and a least square method, and adjusting parameters of a system; in the hybrid algorithm, the least square estimation is used for identifying weight parameters when the forward stage calculates to the definition layer, the error signal in the reverse stage is reversely transmitted, and the membership function parameters are updated by using the reverse propagation. The adoption of the mixing method can reduce the search space scale of the back propagation method, thereby improving the training speed of the fuzzy neural network.

In one embodiment of the present application, the method further includes storing the acquired cluster states and corresponding scene information corresponding to the two unmanned aerial vehicles in a game countermeasure mechanism library for optimization and update of the intent interpretation model and the cluster average field random game model.

And constructing simulation scenes according to different cluster combat task requirements, and constructing an expert system strategy generator. On the premise that the enemy clusters randomly select game strategies, the enemy clusters collect battlefield situation information, an expert system is utilized to generate game countermeasure strategies, a game countermeasure mechanism library is constructed, strategy selection and a battlefield evolution process are stored, and data support is provided for subsequent game algorithms and strategy model design.

In one embodiment of the present application, a simulation instance is provided that applies unmanned game countermeasure policy generation to a typical scenario.

The specific scene settings are as follows: enemy clusters: 10 unmanned aerial vehicles hit the ground target area of 2 places of the my according to a fixed strategy at a certain height outside 1500 m; my cluster: 20 unmanned aerial vehicles are in a spiral standby flight state at a preset position (500 m from a target area and a certain height). The interval between the ground target areas at the position 2 of the my department is 100m, and the area of each ground target area is 10

。

The parameters are set as follows: my/enemy drone field of view distance: 150m,100m; maximum speed of my/enemy drone: 5

,4/>

The other unmanned aerial vehicle dynamic parameters are set according to typical parameters.

In the initialization stage, firstly, a game countermeasure scene is established, and then, a cluster dynamics nonlinear mathematical model is established for participating in a game countermeasure simulation process. And executing an initialization strategy by the clusters of the two parties after loading the dynamic model, updating the detection state in real time, and entering a game countermeasure link once the cluster individuals detect the other party cluster individuals.

In the game countermeasure link, an enemy cluster strategy library is fixed and is allocated in an initialization stage group, and after an individual of the enemy cluster is detected, the enemy cluster wakes up tasks in the group and executes the tasks. Unlike the enemy cluster policy, the enemy cluster policy is variable: firstly, a cluster behavior and track prediction module is utilized to realize behavior prediction according to enemy cluster situation; on the basis of obtaining a prediction result, intent interpretation is carried out on the enemy clusters, and according to the situation of the enemy clusters, expert knowledge is utilized to assist intelligent learning game strategies, and dominant strategies are selected from a strategy library to be game-played with the enemy clusters.

Fig. 4 is a schematic diagram of a first drone game countermeasure simulation provided in an embodiment of the present application. Fig. 5 is a schematic diagram of a game challenge simulation of a second unmanned aerial vehicle according to an embodiment of the present application. Fig. 6 is a schematic diagram of the number change of unmanned aerial vehicles according to an embodiment of the present application. The gaming stage is shown in fig. 4 and 5. Triangles represent my clusters and circles represent enemy clusters. In the beginning, the my clusters move towards the enemy according to the preset patrol formation, and the enemy clusters form a certain formation group to hit the ground target (see fig. 4). In the game stage, the enemy clusters first find enemy clusters in the view field, transfer enemy information in the clusters, then realize enemy cluster track prediction according to the LSTM network, and crack the opponent intention based on the track prediction result, and deduce that the enemy cluster fight intention is to attack the enemy ground target according to a fixed route in the scene simulation (see figure 5). Based on the pre-trained expert knowledge auxiliary game technology, the cost of the 2V1 capturing attack mode is minimum according to the derivation of the mechanism library by the My cluster, and an attack formation is formed autonomously. During the striking process, the game strategy of the my is dynamically changed, if the target is eliminated, the unmanned aerial vehicle automatically matches the next attack target according to the war situation (in fig. 5, the dark triangle represents the unmanned aerial vehicle of the my enters the interception state, the light triangle represents the unmanned aerial vehicle of the my is still in the standby state, the large black circle represents the unmanned aerial vehicle of the enemy has been locked for attack, otherwise, the small light circle represents the unmanned aerial vehicle of the enemy).

To further illustrate the effectiveness of the game strategy, as shown in fig. 6, the two parties of the friend and foe start to strike at about 120s, the friend and foe clusters are all destroyed at about 180s, 10 frames remain at the moment, and the best striking and the minimum cost are realized on the premise of ensuring the safety of ground targets.

Aiming at the problems of low decision accuracy and the like caused by the hostile behavior deception in the process of the game countering of the unmanned aerial vehicle cluster, the unmanned aerial vehicle game countering strategy generation method provided by the embodiment of the application firstly provides a technical scheme for predicting hostile behavior and interpreting intention before strategy generation. Based on methods such as a neural network, dynamic prediction of unmanned aerial vehicle cluster behavior characteristics is realized, then a tactical intention reasoning model is constructed based on a fuzzy neural network and an inverse reinforcement learning method, intelligent reasoning of the hostile intention is realized by combining different source data training neural networks, balanced solution analysis and confidence level of the hostile intention are provided, and accuracy of decision making is improved. Aiming at the high dynamic state of battlefield environment and enemy strategies, most of the current game countermeasure strategy generation algorithms are not strong in rapidness and pertinence, a game countermeasure strategy set is generated by adopting expert knowledge assistance, a game countermeasure mechanism library is constructed by combining typical task links, experience storage is realized, an expert knowledge assistance intelligent learning game strategy algorithm is constructed, and the my strategies are dynamically adjusted to realize the game strategies of the my clusters, so that the high instantaneity and strong pertinence of a decision process are realized.

Based on the same inventive concept, the embodiment of the application also provides an unmanned plane game countermeasure policy generation system corresponding to the unmanned plane game countermeasure policy generation method, and since the principle of solving the problem by the device in the embodiment of the application is similar to that of the unmanned plane game countermeasure policy generation method in the embodiment of the application, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an unmanned plane game countermeasure policy generation system according to an embodiment of the present application. As shown in fig. 7, the unmanned plane game countermeasure policy generation system includes:

the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and comprises:

the track prediction unit 101 is configured to obtain a historical track sequence of at least one target blue unmanned aerial vehicle collected by the red unmanned aerial vehicle, and input a track prediction model trained in advance, so as to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;

the attack target prediction unit 102 is configured to input track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model, so as to output attack target prediction results of each target blue unmanned aerial vehicle;

The intention interpretation unit 103 is configured to input the track prediction results of all the target blue unmanned aerial vehicles and the attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model, so as to output a blue unmanned aerial vehicle cluster state;

the game countermeasure unit 104 is configured to input the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment, and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model, so as to output a preferred motion of the red unmanned aerial vehicle, and control the red unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 800 includes a processor 810, a memory 820, and a bus 830.

The memory 820 stores machine-readable instructions executable by the processor 810, when the electronic device 800 is running, the processor 810 communicates with the memory 820 through the bus 830, and when the machine-readable instructions are executed by the processor 810, the steps of the method for generating the game countermeasure policy by the unmanned aerial vehicle in the method embodiment shown in fig. 1 can be executed, and the specific implementation is referred to the method embodiment and will not be described herein.

The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the method for generating an unmanned plane game countermeasure policy in the method embodiment shown in fig. 1 may be executed, and a specific implementation manner may refer to the method embodiment and will not be described herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RandomAccessMemory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An unmanned aerial vehicle game countermeasure policy generation method, the method comprising:

Acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;

inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle;

inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state;

inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion;

2. The method of claim 1, wherein the clustered average field random game model outputs the preferred actions of each red drone by:

determining the action space of the red unmanned aerial vehicle cluster from a game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster;

determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment;

and solving the execution action meeting Nash equilibrium conditions in the action space of the red unmanned aerial vehicle cluster to serve as the preferable action of the red unmanned aerial vehicle by taking the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables and taking the execution action of the red unmanned aerial vehicle as dependent variables.

3. The method according to claim 2, wherein the status of the blue unmanned aerial vehicle cluster at least includes formation, grouping and combat mode, and the step of determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the status of the blue unmanned aerial vehicle cluster specifically includes:

According to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from the game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute the action according to time sequence arrangement;

and determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.

4. The method of claim 2, wherein the relative position information comprises a line of sight angle between two unmanned aerial vehicles, an entry angle between a velocity vector of a target unmanned aerial vehicle and the line of sight, an angle between velocities of two unmanned aerial vehicles, a distance between two unmanned aerial vehicles, and a relative velocity between two unmanned aerial vehicles, and the markov transition probability distribution for the red unmanned aerial vehicle is determined by:

：

；

；

；

wherein

For this red square unmanned aerial vehicle +.>

And other red unmanned aerial vehicle->

An angular synergistic potential field between->

For this red square unmanned aerial vehicle +. >

And other red unmanned aerial vehicle->

The distance between them cooperates with the potential field->

For this red square unmanned aerial vehicle +.>

And other red unmanned aerial vehicle->

A speed synergy potential field between->

For this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

An angular potential field between +.>

For this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

The distance between the power fields of the distances,

for this red square unmanned aerial vehicle +.>

Unmanned plane with target blue party>

A velocity potential field therebetween;

：

；

wherein ,

for this red square unmanned aerial vehicle +.>

Current time->

Is in the state of motion->

For this red square unmanned aerial vehicle +.>

Current time->

Is performed by the processor.

5. The method according to claim 4, wherein the step of solving the execution action satisfying the nash balance condition in the action space as the preferred action of the red unmanned aerial vehicle by taking the potential field probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, wherein the execution action of the red unmanned aerial vehicle is dependent variables, specifically comprises:

wherein ,

for this red square unmanned aerial vehicle +.>

Is>

For this red square unmanned aerial vehicle +.>

Is the preferred action of (a), discount rate

。

6. The method as recited in claim 2, further comprising:

and storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle of the red and blue parties in the game countermeasure mechanism library for optimizing and updating the intent interpretation model and the cluster average field random game model.

7. The method of claim 3, wherein the intent interpretation model is generated by training in the following manner:

acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a plurality of sample blue unmanned aerial vehicle track sequences and corresponding blue unmanned aerial vehicle cluster states;

constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer,

the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical quantity of the combat mode, each fuzzy node corresponds to a different membership function, and the membership function has the formula:

，

wherein

，/>

For input node in input layer->

Corresponding track sequence,/->

For input node->

Connected fuzzy node +.>

Corresponding membership function, +.>

For the first target parameter, +.>

As a second target parameter, the first target parameter,

the fuzzy reasoning layer comprises a plurality of reasoning nodes, and the calculation rule formula of each reasoning node is as follows:

，

the output layer comprises a plurality of output nodes, and a definition function formula of each output node is as follows:

，

wherein ,

is a third target parameter;

and inputting the training data set into the constructed target model neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target model neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire the pre-trained intention interpretation model.

8. An unmanned aerial vehicle game countermeasure policy generation system, the system comprising:

the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and the control module comprises:

the track prediction unit is used for acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, inputting a pre-trained track prediction model, and outputting a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;

The attack target prediction unit is used for inputting track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue unmanned aerial vehicle;

the intention interpretation unit is used for inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state;

the game countermeasure unit is used for inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion;

9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory in communication over the bus when the electronic device is running, said processor executing said machine readable instructions to perform the steps of the unmanned aerial vehicle game countermeasure policy generation method of any of claims 1 to 7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the unmanned aerial vehicle game countermeasure policy generation method of any of claims 1 to 7.