CN116360503B - Unmanned plane game countermeasure strategy generation method and system and electronic equipment - Google Patents

Unmanned plane game countermeasure strategy generation method and system and electronic equipment Download PDF

Info

Publication number
CN116360503B
CN116360503B CN202310628021.7A CN202310628021A CN116360503B CN 116360503 B CN116360503 B CN 116360503B CN 202310628021 A CN202310628021 A CN 202310628021A CN 116360503 B CN116360503 B CN 116360503B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
red
target
blue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310628021.7A
Other languages
Chinese (zh)
Other versions
CN116360503A (en
Inventor
刘昊
吕金虎
王新迪
高庆
刘德元
钟森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Academy of Mathematics and Systems Science of CAS
Original Assignee
Beihang University
Academy of Mathematics and Systems Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University, Academy of Mathematics and Systems Science of CAS filed Critical Beihang University
Priority to CN202310628021.7A priority Critical patent/CN116360503B/en
Publication of CN116360503A publication Critical patent/CN116360503A/en
Application granted granted Critical
Publication of CN116360503B publication Critical patent/CN116360503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/106Change initiated in response to external conditions, e.g. avoidance of elevated terrain or of no-fly zones
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The application provides a method, a system and electronic equipment for generating a game countermeasure strategy of an unmanned aerial vehicle, and relates to the technical field of aircraft control, wherein the method comprises the steps of inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; the method comprises the steps of inputting a cluster state of a blue-side unmanned aerial vehicle, relative position information between the red-side unmanned aerial vehicle and a target blue-side unmanned aerial vehicle at the current moment, relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and a motion state of the red-side unmanned aerial vehicle at the current moment into a cluster average field random game model trained in advance so as to output a preferred motion of the red-side unmanned aerial vehicle, and controlling the red-side unmanned aerial vehicle to move according to the determined preferred motion, so that the accuracy of unmanned aerial vehicle game defense strategy generation is improved.

Description

Unmanned plane game countermeasure strategy generation method and system and electronic equipment
Technical Field
The application relates to the technical field of aircraft control, in particular to a method, a system and electronic equipment for generating a game countermeasure strategy of an unmanned aerial vehicle.
Background
The unmanned aerial vehicle game defense strategy autonomous generation technology refers to a technology for autonomously generating a game strategy based on battlefield situation and perceived information of both parties of a enemy in an operational environment by an unmanned aerial vehicle cluster so as to realize the aims of resisting the operational intention of the enemy, protecting the ground targets of the enemy and achieving the operational purpose of the enemy. In the prior art, the existing policy generation method has lower decision accuracy when the enemy unmanned aerial vehicle cluster has deception and false action scenes, so that a policy generation algorithm with higher decision accuracy is needed.
Disclosure of Invention
Therefore, the application aims to provide a method, a system and electronic equipment for generating an unmanned aerial vehicle game countermeasure strategy so as to improve the accuracy of unmanned aerial vehicle game defending strategy generation.
In a first aspect, the present application provides a method for generating a game countermeasure policy for an unmanned aerial vehicle, where the method includes: for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes: acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle; inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
Preferably, the clustered average field random game model outputs the preferred actions for each red-square drone by: determining the action space of the red unmanned aerial vehicle cluster from a game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster; determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment; the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment are taken as independent variables, the execution action of the red unmanned aerial vehicle is taken as a dependent variable, and the execution action meeting Nash equilibrium conditions in the action space of the red unmanned aerial vehicle cluster is solved to be taken as the preferable action of the red unmanned aerial vehicle.
Preferably, the state of the blue unmanned aerial vehicle cluster at least comprises a formation, a grouping and a combat mode, and the step of determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster specifically comprises the following steps: according to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from a game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute the action according to time sequence arrangement; and determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.
Preferably, the relative position information includes a line of sight angle between two unmanned aerial vehicles, an entry angle between a speed vector of a target unmanned aerial vehicle and the line of sight, an included angle between speeds of the two unmanned aerial vehicles, a distance between the two unmanned aerial vehicles, and a relative speed between the two unmanned aerial vehicles, and the markov transition probability distribution of the red unmanned aerial vehicle is determined by:
the total potential field energy of the red unmanned aerial vehicle is determined by the following formula
wherein For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->The angle between them cooperates with the potential field,for this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->The distance between them cooperates with the potential field->For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->A speed synergy potential field between->For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>An angular potential field between +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>Distance between the power fields, +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>A velocity power potential field between +.>Is in combination with red square unmanned aerial vehicle->Corresponding red unmanned aerial vehicle set, +.>Is in combination with red square unmanned aerial vehicle->A corresponding blue-square unmanned aerial vehicle set;
determining the Markov transition probability distribution of the red unmanned aerial vehicle through the following formula
wherein ,for this red square unmanned aerial vehicle +.>Current time->Is in the state of motion->For this red square unmanned aerial vehicle +.>Current time->Is performed by the processor.
Preferably, the step of solving the execution action of the red unmanned aerial vehicle, which satisfies the nash equilibrium condition in the action space, as the preferred action of the red unmanned aerial vehicle by using the markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, specifically includes:
determining the execution action meeting the Nash equilibrium condition as the preferable action of the red unmanned aerial vehicle through the following formula:
wherein ,for this red square unmanned aerial vehicle +.>Is>For this red square unmanned aerial vehicle +.>Is the preferred action of (a), discount rate,/>Is an action space.
Preferably, the method further comprises the step of storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle in a game countermeasure mechanism library for optimizing and updating an intention interpretation model and a cluster average field random game model.
Preferably, the intent interpretation model is generated by training in the following way: acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a plurality of sample blue unmanned aerial vehicle track sequences and corresponding blue unmanned aerial vehicle cluster states; the method comprises the steps of constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer, the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical quantity of combat modes, each fuzzy node corresponds to a different membership function, and the membership function has a formula as follows:
wherein ,/>For input node in input layer->Corresponding track sequence,/->For input node->Connected fuzzy node +.>Corresponding membership function, +.>For the first target parameter, +.>For the second target parameter, the fuzzy inference layer comprises a plurality of inference nodes, and the calculation rule formula of each inference node is as follows:
the output layer comprises a plurality of output nodes, and the definition function formula of each output node is as follows:
wherein ,is a third target parameter; and inputting the training data set into the constructed target fuzzy neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target fuzzy neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire a pre-trained intention interpretation model.
In a second aspect, the present application provides an unmanned aerial vehicle game countermeasure policy generation system, the system comprising: the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and comprises: the track prediction unit is used for acquiring a history track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, inputting a pre-trained track prediction model, and outputting a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; the attack target prediction unit is used for inputting track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue unmanned aerial vehicle; the intention interpretation unit is used for inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; the game countermeasure unit is used for inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
In a third aspect, the present application also provides an electronic device, including: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device is running, and the machine-readable instructions are executed by the processor to perform the steps of generating the game countermeasure strategy of the unmanned plane.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of generating a drone game countermeasure strategy as described above.
The application provides a method for generating a game countermeasure strategy of an unmanned aerial vehicle, which comprises the following steps of determining the execution action of each unmanned aerial vehicle of the red party at the next moment and controlling the unmanned aerial vehicle of the red party to move according to the determined execution action by aiming at each unmanned aerial vehicle of the red party: acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle; inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range, behaviors and intentions of the blue unmanned aerial vehicle are analyzed through deep learning before strategy generation, game countermeasure strategies of the red unmanned aerial vehicle are dynamically generated and adjusted, and timeliness of accuracy of decisions is improved.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for generating a game countermeasure policy for an unmanned aerial vehicle according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating steps for determining a preferred action according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a scenario for generating a game countermeasure strategy according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a first unmanned aerial vehicle game countermeasure simulation according to an embodiment of the present application;
fig. 5 is a schematic diagram of a game countermeasure simulation of a second unmanned aerial vehicle according to an embodiment of the present application;
fig. 6 is a schematic diagram of a change in the number of unmanned aerial vehicles according to an embodiment of the present application;
Fig. 7 is a block diagram of an unmanned plane game countermeasure policy generation system according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment obtained by a person skilled in the art without making any inventive effort falls within the scope of protection of the present application.
First, an application scenario to which the present application is applicable will be described. The method can be applied to game countermeasure strategy generation of unmanned aerial vehicle cooperative combat, and is particularly suitable for control strategy generation of the rotor unmanned aerial vehicle.
The research shows that the unmanned aerial vehicle game defense strategy autonomous generation technology refers to a technology for automatically generating a game strategy based on battlefield situation and perceived information of both sides of a friend or foe in an operational environment by an unmanned aerial vehicle cluster so as to realize the aims of resisting the intention of the foe to combat, protecting the ground targets of the friend or foe and achieving the aim of the foe to combat. In the prior art, lei et al, for example, put forward an optimal strategy based on complete information and Markov, so as to realize attack and defense for a moving target; carter et al considers the dynamic conversion of game models under different attack scenarios, thereby proposing a strategy generation algorithm; the Garcia et al models the unmanned aerial vehicle cluster game problem as a differential game problem of cluster hitting, and gives out the unmanned aerial vehicle cluster anti-hitting guidance law under the scene by establishing a whole process performance function and giving out a hitting capability evaluation function.
Based on the method, the system and the electronic equipment for generating the game countermeasure strategy of the unmanned aerial vehicle are provided by the embodiment of the application, so that the accuracy of generating the game defending strategy of the unmanned aerial vehicle is improved.
Referring to fig. 1 and 3, fig. 1 is a flowchart of a method for generating a game countermeasure policy for an unmanned aerial vehicle according to an embodiment of the present application, and fig. 3 is a schematic diagram of a scenario for generating a game countermeasure policy according to an embodiment of the present application. As shown in fig. 1, the method for generating the game countermeasure policy of the unmanned aerial vehicle provided by the embodiment of the application includes:
for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes:
here, in one embodiment of the present application, the execution of the red unmanned aerial vehicle is dynamically transformed, that is, the execution of each red unmanned aerial vehicle at the next moment is determined based on the historical data collected by the red unmanned aerial vehicle. The historical data here includes, but is not limited to, the status and scene information of the blue unmanned aerial vehicle collected by the red unmanned aerial vehicle over a period of time. Actions herein include, but are not limited to, speed, pose, whether to attack, etc. of the drone.
S101, acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle.
In step S101, a trajectory prediction model is trained and generated based on an LSTM (Long-Short Time Memory Neural Networks, long-short-term memory neural network) model. The track prediction model is input into a track sequence of each target blue unmanned aerial vehicle, the target blue unmanned aerial vehicle refers to any blue unmanned aerial vehicle monitored in the sensing range of the red unmanned aerial vehicle, and the history track sequence can be continuous or discrete, such as continuous three-dimensional position coordinates of the target blue unmanned aerial vehicle in the past 10 seconds, or three-dimensional position coordinates corresponding to the 8 th second, the 5 th second and the 3 rd second of the target blue unmanned aerial vehicle. The output of the track prediction model is a predicted track sequence corresponding to each input target blue unmanned aerial vehicle, for example, three-dimensional position coordinates within 5 seconds of the target blue unmanned aerial vehicle or three-dimensional position coordinates within 3 seconds of the target blue unmanned aerial vehicle, and the shorter the length of the predicted track sequence, the more beneficial to the prediction in the step S102.
S102, inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle.
The attack target prediction model is trained and generated by adopting an IDCNN (Iterated Dilated Convolutional Neural Networks) model and an iterative cavity convolution neural network. The input of the attack target prediction model is a track prediction result of each target blue unmanned aerial vehicle output by the track prediction model. The output of the attack target prediction model is the attack target prediction result of each target blue unmanned aerial vehicle, for example, the attack target of the target blue unmanned aerial vehicle A is the red unmanned aerial vehicle B, the attack target of the target blue unmanned aerial vehicle C is the red ground target D, and the like. The attack target at least comprises a red unmanned aerial vehicle, a red ground target or an unmanned aerial vehicle.
S103, inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state.
The intent interpretation model is generated by training with a fuzzy neural network model. The input of the intention interpretation model is a behavior sequence of each target blue unmanned aerial vehicle determined by the output results of the track prediction model and the attack target prediction model. The output of the intent interpretation model is the cluster state of the blue unmanned aerial vehicle, wherein the cluster state at least comprises information such as formation, grouping, combat mode and the like, for example, formation of a first formation grouping to hit the ground target on my side, formation of a second formation cluster to detect the information on my side, formation of a third formation cluster to interfere with the unmanned aerial vehicle B on my side and the like.
S104, inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a cluster average field random game model trained in advance so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion.
The target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
The cluster mean field random game model is generated by training based on reinforcement learning by an expert knowledge-aided mean field cluster game strategy generation algorithm. Inputs of the cluster average field random game model comprise enemy situation (relative geometric information between two machines), enemy unmanned aerial vehicle cluster state and the like. The output of the cluster average field random game model is the preferred action to be executed by the red unmanned aerial vehicle at the next moment.
As shown in fig. 2, fig. 2 is a flowchart illustrating steps for determining a preferred action according to an embodiment of the present application. Specifically, the clustered average field random game model outputs the preferred actions of each red-square drone by:
S201, determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster.
The blue-side unmanned aerial vehicle cluster state at least comprises a formation, a grouping and a combat mode, and the step of determining the action space of the red-side unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the blue-side unmanned aerial vehicle cluster state specifically comprises the following steps:
according to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from a game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute actions according to time sequence arrangement. And determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.
The game countermeasure library may be pre-established, and the game countermeasure mechanism library includes a plurality of unmanned aerial vehicle cluster countermeasure schemes, where the unmanned aerial vehicle cluster countermeasure schemes are used to instruct each red unmanned aerial vehicle in the cluster to execute actions according to time sequence, for example, move to enemies according to a predetermined patrol formation, catch attack enemies according to 2V1, and so on.
The unmanned cluster countermeasure scheme herein may be expressed asOf (1), wherein->, wherein />For fingerShow->And (5) the red unmanned aerial vehicle. The action space here can be expressed as +.>At any decision moment, unmanned plane +.>Action to be taken->Not only to itself but also to the whole cluster.
S202, determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment.
The relative position information comprises a sight angle between two unmanned aerial vehicles, an entry angle between a speed vector of a target unmanned aerial vehicle and the sight, an included angle between speeds of the two unmanned aerial vehicles, a distance between the two unmanned aerial vehicles and a relative speed between the two unmanned aerial vehicles, and the Markov transition probability distribution of the red unmanned aerial vehicle is determined in the following mode:
the total potential field energy of the red unmanned aerial vehicle is determined by the following formula
wherein For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->The angle between them cooperates with the potential field,for this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->The distance between them cooperates with the potential field- >For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->A speed synergy potential field between->For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>An angular potential field between +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>Distance between the power fields, +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>A velocity power potential field between +.>Is in combination with red square unmanned aerial vehicle->Corresponding red unmanned aerial vehicle set, +.>Is in combination with red square unmanned aerial vehicle->A corresponding blue-square unmanned aerial vehicle set;
determining the Markov transition probability distribution of the red unmanned aerial vehicle through the following formula
wherein ,for this red square unmanned aerial vehicle +.>Current time->Is in the state of motion->For this red square unmanned aerial vehicle +.>Current time->Is performed by the processor.
An average field cluster-based gaming strategy generation algorithm, which may be expert knowledge aided here, is based on reinforcement learning. Unmanned planeThe acquired perception information is mainly relative geometric information between two machines, and can be expressed asRespectively represent unmanned plane->Line of sight angle between speed vector and two machine lines of sight, unmanned aerial vehicle +.>Target entry angle between velocity vector and line of sight, distance between two machines, angle between speeds of two machines, and relative speed of two machines.
ConsiderClusters of individual unmanned aerial vehicles, record +.>Nearest to the individual unmanned plane->The personal unmanned plane is gathered as +.>Relative to which is nearest +.>The personal enemy unmanned aerial vehicle set is +.>. Introducing potential field concept, defining My +.>Unmanned aerial vehicle receives in my cluster +.>The radiated potential field is a synergistic potential field +.>Respectively, angular uniformity, distance uniformity and speed uniformity. At the same time define My->Unmanned aerial vehicle receives in enemy cluster +.>The radiated potential field is the power field +.>Respectively representing enemy angle power, distance power and speed power.
The step establishes interaction among clustered individuals, and can calculate the total potential field energy and the quantitative situation of the clustered individuals according to the relative position relation with the neighborhood individuals, so as to adopt maneuvering strategies.
S203, taking the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, taking the execution action of the red unmanned aerial vehicle as the dependent variables, and solving the execution action meeting the Nash equilibrium condition in the action space as the preferable action of the red unmanned aerial vehicle.
The method specifically comprises the steps of taking potential field probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, taking the execution action of the red unmanned aerial vehicle as the dependent variables, and solving the execution action meeting Nash equilibrium conditions in an action space as the preferable action of the red unmanned aerial vehicle, wherein the steps comprise the following steps:
Determining the execution action meeting the Nash equilibrium condition as the preferable action of the red unmanned aerial vehicle through the following formula:
wherein ,for this red square unmanned aerial vehicle +.>Is>For this red square unmanned aerial vehicle +.>Is the preferred action of (a), discount rateIs an action space.
Defined hereinFor the overall impact on the individual, the Markov transition probability for the individual state is defined as follows:
wherein ,is the current unmanned plane state. For->And discount rate->Unmanned aerial vehicle->The objective of (a) is to maximize the desired cumulative discount rewards function, i.e
The process establishes an average field random game model of the clusters, rewards obtained by the clusters are influenced by state probability distribution of other unmanned aerial vehicles, and the clusters are mutually influenced through state and income functions.
For containingCluster system of unmanned aerial vehicles, given +.>Status of personal unmanned plane->Probability distribution resulting from sum potential fieldDetermining an optimal strategy->To satisfy the Nash equilibrium condition:
the formula defining the value function is:
the bellman formula can be obtained according to the dynamic programming principle as follows:
when the method is used for solving the problem, the state transition probability needs to be obtained, meanwhile, the design of rewarding and punishing functions is needed in the training process, and quantized data is obtained through conversion of expert knowledge. Solving the problems can obtain the optimal sequential decision of the single unmanned aerial vehicle, and the whole cluster can emerge the cluster behavior such as autonomous formation, grouping number, killing mode and the like by considering the whole cluster.
According to the unmanned aerial vehicle game countermeasure strategy generation method provided by the embodiment of the application, the behavior and the intention of the blue unmanned aerial vehicle are analyzed through deep learning before strategy generation, so that the game countermeasure strategy of the red unmanned aerial vehicle is dynamically generated and adjusted, and the timeliness of the accuracy of the decision is improved.
In one embodiment of the application, the trajectory prediction model and the attack target prediction model are generated by training the following processes:
aiming at the dynamics of the enemy clusters, the space situation prediction modeling inversion and the on-line optimization based on the neural network are developed, and the dynamic prediction of the enemy cluster behavior characteristics is realized.
Aiming at the problems of rapid change of battlefield situation, difficult prediction in game countermeasure process, multiple constraint of game model and the like, in the embodiment, a neural network model is designed based on a long-short-time memory neural network to predict the attack path of an enemy cluster, and then a classifier for predicting the attack target is designed based on an LSTM and an iterative hole convolution neural network. The target track prediction based on deep learning does not need to model the target when in use, and overcomes the defect that the acceleration change is difficult to predict due to unknown target aerodynamic parameters in the traditional algorithm.
Specifically, firstly, extracting time sequence characteristics of a blue unmanned aerial vehicle track by an LSTM layer, and then searching local characteristics by an IDCNN layer, thereby completing classification of targets. For track prediction, the LSTM has the characteristics of high calculation speed and good real-time performance, and the LSTM is used for design independently. And collecting flight tracks of the enemy aircraft within a period of time, and constructing a track database for training and generating a track prediction model.
In LSTM, first, the reserved data of the last moment is determined, and this part is composed of a forgetting gate, where the forgetting gate function expression is:
wherein ,respectively representing an output value at a moment and an input value at a current moment on the LSTM network, < >>Representing an activation function->Representing forgetting door weight,/->Indicating forgetting door bias->The vector is output for the forget gate.
The forgetting gate decides the transmission condition of the state at the last moment by converting the state between the input and the last moment into a value between 0 and 1 through an activation function, and then the input gate decides the update information of the unit state, and the input gate function expression is as follows:
wherein ,respectively representing the weight and bias of the input gate, +.>Weight and bias respectively representing extraction of effective information, +.>Current value to be updated representing the state of the cell, +. >For inputting gate weight for controlling +.>Which features are eventually updated to the cell state +.>In, cell state->The functional expression is:
finally, outputting a final result by an output gate, wherein the output gate function expression is as follows:
wherein ,the second formula calculates the output weight of the output gate, namely the state of the unit at the current moment, the weight and the bias of the output gate>For LSTM final output, it is determined by the output gate weight and cell state. Through the three gating units and unit state transfer, the LSTM can process the time sequence problem, and can be further used for scenes such as time sequence track prediction and the like.
For the attack target prediction model, firstly, a trained track prediction model is input according to the current available partial track data and expert data so as to obtain a plurality of track prediction results, the track prediction results are used as the input of a classifier, and the attack target is predicted through the classifier.
In one embodiment of the application, aiming at the characteristics of intent antagonism, high dynamic property, deception and the like of the adversary clusters, based on the acquired situation information and expert knowledge base, a fuzzy neural network and an inverse reinforcement learning framework are adopted to develop interpretation and modeling analysis of the adversary intent under the cluster game antagonism condition. Constructing a tactical intention model based on a fuzzy neural network, forming an countermeasure sample by using the target attribute of the enemy and the corresponding tactical intention to train, and combining the neural network formed by different source data to realize intelligent reasoning of the enemy fight intention, and deducing the enemy cluster state according to the basic action of the intelligent reasoning, thereby improving the accuracy and the rapidity of interpretation; based on the inverse reinforcement learning framework, the intention balance solution analysis is given, the confidence of the prediction trend is further evaluated, and the interpretability is further improved. Planning generation supporting my gaming strategy. The intent interpretation model is here trained to be generated by:
And acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a track sequence of a plurality of sample blue unmanned aerial vehicles and a corresponding blue unmanned aerial vehicle cluster state.
The fuzzy neural network system can be built based on the fuzzy system and fused with the neural network, and is converted into the self-adaptive network to realize the learning process of the T-S fuzzy type.
And constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer.
Firstly, constructing an input layer, and recording an input vector corresponding to each node input network of the input layer as
Then, a fuzzy layer is constructed, wherein the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical number of the combat mode, each fuzzy node corresponds to a different membership function, and the membership function has the formula:
wherein ,/>For input node in input layer->Corresponding track sequence,/->For input node->Connected fuzzy node +.>Corresponding membership function, +.>For the first target parameter, +.>As a second target parameter, the first target parameter,
each input node corresponds toFuzzy nodes, called->Is used +.>And (3) representing. Each fuzzy node corresponds to a respective membership function +. >Which can be expressed as +.>
A fuzzy inference layer is then built, each inference node representing a rule (i.e., intent of enemy party such as interference, impact, killing, etc.) for computing the fitness of the rule.
Specifically, the fuzzy inference layer includes a plurality of inference nodes, and the calculation rule formula of each inference node is:
and finally, building a definition layer, and clearing and outputting the data of the fuzzy reasoning layer.
The output layer comprises a plurality of output nodes, and the definition function formula of each output node is as follows:
wherein ,is a third target parameter; and inputting the training data set into the constructed target fuzzy neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target fuzzy neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire a pre-trained intention interpretation model.
Specifically, in the training stage, the number of nodes at each layer needs to be predetermined first, the fuzzy inference layer is given through a specific operation rule, and the parameters to be learned are mainly. Then, given the target attribute obtained after the sensor data fusion and the corresponding tactical intent countermeasure sample, the comprehensive calculation target behavior intent is obtained from different data sources and is converted into a training data set. After finishing data preparation, entering a parameter training stage, carrying out training learning of parameters through back propagation or a mixed algorithm of back propagation and a least square method, and adjusting parameters of a system; in the hybrid algorithm, the least square estimation is used for identifying weight parameters when the forward stage calculates to the definition layer, the error signal in the reverse stage is reversely transmitted, and the membership function parameters are updated by using the reverse propagation. The adoption of the mixing method can reduce the search space scale of the back propagation method, thereby improving the training speed of the fuzzy neural network.
In one embodiment of the application, the method further comprises the step of storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle of the red and blue parties in a game countermeasure mechanism library for optimizing and updating an intention interpretation model and a cluster average field random game model.
And constructing simulation scenes according to different cluster combat task requirements, and constructing an expert system strategy generator. On the premise that the enemy clusters randomly select game strategies, the enemy clusters collect battlefield situation information, an expert system is utilized to generate game countermeasure strategies, a game countermeasure mechanism library is constructed, strategy selection and a battlefield evolution process are stored, and data support is provided for subsequent game algorithms and strategy model design.
In one embodiment of the application, a simulation instance is provided that applies unmanned game countermeasure policy generation to a typical scenario.
The specific scene settings are as follows: enemy clusters: 10 unmanned aerial vehicles hit the ground target area of 2 places of the my according to a fixed strategy at a certain height outside 1500 m; my cluster: 20 unmanned aerial vehicles are in a spiral standby flight state at a preset position (500 m from a target area and a certain height). The interval between the ground target areas at the position 2 of the my department is 100m, and the area of each ground target area is 10
The parameters are set as follows: my/enemy drone field of view distance: 150m,100m; maximum speed of my/enemy drone: 5,4/>The other unmanned aerial vehicle dynamic parameters are set according to typical parameters.
In the initialization stage, firstly, a game countermeasure scene is established, and then, a cluster dynamics nonlinear mathematical model is established for participating in a game countermeasure simulation process. And executing an initialization strategy by the clusters of the two parties after loading the dynamic model, updating the detection state in real time, and entering a game countermeasure link once the cluster individuals detect the other party cluster individuals.
In the game countermeasure link, an enemy cluster strategy library is fixed and is allocated in an initialization stage group, and after an individual of the enemy cluster is detected, the enemy cluster wakes up tasks in the group and executes the tasks. Unlike the enemy cluster policy, the enemy cluster policy is variable: firstly, a cluster behavior and track prediction module is utilized to realize behavior prediction according to enemy cluster situation; on the basis of obtaining a prediction result, intent interpretation is carried out on the enemy clusters, and according to the situation of the enemy clusters, expert knowledge is utilized to assist intelligent learning game strategies, and dominant strategies are selected from a strategy library to be game-played with the enemy clusters.
Fig. 4 is a schematic diagram of a first unmanned aerial vehicle game countermeasure simulation according to an embodiment of the present application. Fig. 5 is a schematic diagram of a game challenge simulation of a second unmanned aerial vehicle according to an embodiment of the present application. Fig. 6 is a schematic diagram of the number change of unmanned aerial vehicles according to an embodiment of the present application. The gaming stage is shown in fig. 4 and 5. Triangles represent my clusters and circles represent enemy clusters. In the beginning, the my clusters move towards the enemy according to the preset patrol formation, and the enemy clusters form a certain formation group to hit the ground target (see fig. 4). In the game stage, the enemy clusters first find enemy clusters in the view field, transfer enemy information in the clusters, then realize enemy cluster track prediction according to the LSTM network, and crack the opponent intention based on the track prediction result, and deduce that the enemy cluster fight intention is to attack the enemy ground target according to a fixed route in the scene simulation (see figure 5). Based on the pre-trained expert knowledge auxiliary game technology, the cost of the 2V1 capturing attack mode is minimum according to the derivation of the mechanism library by the My cluster, and an attack formation is formed autonomously. During the striking process, the game strategy of the my is dynamically changed, if the target is eliminated, the unmanned aerial vehicle automatically matches the next attack target according to the war situation (in fig. 5, the dark triangle represents the unmanned aerial vehicle of the my enters the interception state, the light triangle represents the unmanned aerial vehicle of the my is still in the standby state, the large black circle represents the unmanned aerial vehicle of the enemy has been locked for attack, otherwise, the small light circle represents the unmanned aerial vehicle of the enemy).
To further illustrate the effectiveness of the game strategy, as shown in fig. 6, the two parties of the friend and foe start to strike at about 120s, the friend and foe clusters are all destroyed at about 180s, 10 frames remain at the moment, and the best striking and the minimum cost are realized on the premise of ensuring the safety of ground targets.
Aiming at the problems of low decision accuracy and the like caused by the hostile behavior deception in the process of the game countering of the unmanned aerial vehicle cluster, the method for generating the unmanned aerial vehicle game countermeasures provided by the embodiment of the application provides a technical scheme for predicting hostile behavior and interpreting intention before strategy generation. Based on methods such as a neural network, dynamic prediction of unmanned aerial vehicle cluster behavior characteristics is realized, then a tactical intention reasoning model is constructed based on a fuzzy neural network and an inverse reinforcement learning method, intelligent reasoning of the hostile intention is realized by combining different source data training neural networks, balanced solution analysis and confidence level of the hostile intention are provided, and accuracy of decision making is improved. Aiming at the high dynamic state of battlefield environment and enemy strategies, most of the current game countermeasure strategy generation algorithms are not strong in rapidness and pertinence, a game countermeasure strategy set is generated by adopting expert knowledge assistance, a game countermeasure mechanism library is constructed by combining typical task links, experience storage is realized, an expert knowledge assistance intelligent learning game strategy algorithm is constructed, and the my strategies are dynamically adjusted to realize the game strategies of the my clusters, so that the high instantaneity and strong pertinence of a decision process are realized.
Based on the same inventive concept, the embodiment of the application also provides an unmanned plane game countermeasure policy generation system corresponding to the unmanned plane game countermeasure policy generation method, and because the principle of solving the problem of the device in the embodiment of the application is similar to that of the unmanned plane game countermeasure policy generation method in the embodiment of the application, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an unmanned plane game countermeasure policy generation system according to an embodiment of the present application. As shown in fig. 7, the unmanned plane game countermeasure policy generation system includes:
the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and comprises:
the track prediction unit 101 is configured to obtain a historical track sequence of at least one target blue unmanned aerial vehicle collected by the red unmanned aerial vehicle, and input a track prediction model trained in advance, so as to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;
the attack target prediction unit 102 is configured to input track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model, so as to output attack target prediction results of each target blue unmanned aerial vehicle;
The intention interpretation unit 103 is configured to input the track prediction results of all the target blue unmanned aerial vehicles and the attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model, so as to output a blue unmanned aerial vehicle cluster state;
the game countermeasure unit 104 is configured to input the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment, and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model, so as to output a preferred motion of the red unmanned aerial vehicle, and control the red unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the application. As shown in fig. 8, the electronic device 800 includes a processor 810, a memory 820, and a bus 830.
The memory 820 stores machine-readable instructions executable by the processor 810, when the electronic device 800 is running, the processor 810 communicates with the memory 820 through the bus 830, and when the machine-readable instructions are executed by the processor 810, the steps of the method for generating the game countermeasure policy by the unmanned aerial vehicle in the method embodiment shown in fig. 1 can be executed, and the specific implementation is referred to the method embodiment and will not be described herein.
The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the method for generating an unmanned plane game countermeasure policy in the method embodiment shown in fig. 1 can be executed, and a specific implementation manner may refer to the method embodiment and will not be described herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RandomAccessMemory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (9)

1. An unmanned aerial vehicle game countermeasure policy generation method, the method comprising:
for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes:
Acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;
inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle;
inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intent interpretation model to output a blue unmanned aerial vehicle cluster state, wherein the intent interpretation model is generated through training in the following way: acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a plurality of sample blue unmanned aerial vehicle track sequences and corresponding blue unmanned aerial vehicle cluster states; the method comprises the steps of constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer, the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical quantity of combat modes, each fuzzy node corresponds to a different membership function, and the membership function has a formula as follows:
wherein ,,/>for input node in input layer->Corresponding track sequence,/->For input node->Connected fuzzy node +.>Corresponding membership function, +.>For the first target parameter, +.>For the second target parameter, the fuzzy inference layer comprises a plurality of inference nodes, and the calculation rule formula of each inference node is as follows:
the output layer comprises a plurality of output nodes, and a definition function formula of each output node is as follows:
wherein ,is a third target parameter; inputting the training data set into a constructed target fuzzy neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target fuzzy neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire the pre-trained intention interpretation model;
inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion;
The target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
2. The method of claim 1, wherein the clustered average field random game model outputs the preferred actions of each red drone by:
determining the action space of the red unmanned aerial vehicle cluster from a game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster;
determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment;
and solving the execution action meeting Nash equilibrium conditions in the action space of the red unmanned aerial vehicle cluster to serve as the preferable action of the red unmanned aerial vehicle by taking the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables and taking the execution action of the red unmanned aerial vehicle as dependent variables.
3. The method according to claim 2, wherein the status of the blue unmanned aerial vehicle cluster at least includes formation, grouping and combat mode, and the step of determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the status of the blue unmanned aerial vehicle cluster specifically includes:
According to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from the game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute the action according to time sequence arrangement;
and determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.
4. The method of claim 2, wherein the relative position information comprises a line of sight angle between two unmanned aerial vehicles, an entry angle between a velocity vector of a target unmanned aerial vehicle and the line of sight, an angle between velocities of two unmanned aerial vehicles, a distance between two unmanned aerial vehicles, and a relative velocity between two unmanned aerial vehicles, and the markov transition probability distribution for the red unmanned aerial vehicle is determined by:
the total potential field energy of the red unmanned aerial vehicle is determined by the following formula
wherein For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->An angular synergistic potential field between->For this red square unmanned aerial vehicle +. >And other red unmanned aerial vehicle->The distance between them cooperates with the potential field->For this red square unmanned aerial vehicle +.>And other red unmanned aerial vehicle->A speed synergy potential field between->For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>An angular potential field between +.>For this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>The distance between the power fields of the distances,for this red square unmanned aerial vehicle +.>Unmanned plane with target blue party>A velocity power potential field between +.>Is in combination with red square unmanned aerial vehicle->Corresponding red unmanned aerial vehicle set, +.>Is in combination with red square unmanned aerial vehicle->A corresponding blue-square unmanned aerial vehicle set;
ma Erke of the red unmanned aerial vehicle is determined by the following formulaProbability distribution of a Fu transition
wherein ,for this red square unmanned aerial vehicle +.>Current time->Is in the state of motion->For this red square unmanned aerial vehicle +.>Current time->Is performed by the processor.
5. The method according to claim 4, wherein the step of solving the execution action of the red unmanned aerial vehicle, which satisfies the nash equilibrium condition in the action space, as the preferred action of the red unmanned aerial vehicle by using the markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, specifically comprises:
Determining the execution action meeting the Nash equilibrium condition as the preferable action of the red unmanned aerial vehicle through the following formula:
wherein ,for this red square unmanned aerial vehicle +.>Is>For this red square unmanned aerial vehicle +.>Is the preferred action of (a), discount rate,/>Is an action space.
6. The method as recited in claim 2, further comprising:
and storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle of the red and blue parties in the game countermeasure mechanism library for optimizing and updating the intent interpretation model and the cluster average field random game model.
7. An unmanned aerial vehicle game countermeasure policy generation system, the system comprising:
the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and the control module comprises:
the track prediction unit is used for acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, inputting a pre-trained track prediction model, and outputting a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;
The attack target prediction unit is used for inputting track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue unmanned aerial vehicle;
the intent interpretation unit is used for inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intent interpretation model so as to output a blue unmanned aerial vehicle cluster state, wherein the intent interpretation model is generated through training in the following way: acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a plurality of sample blue unmanned aerial vehicle track sequences and corresponding blue unmanned aerial vehicle cluster states; the method comprises the steps of constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer, the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical quantity of combat modes, each fuzzy node corresponds to a different membership function, and the membership function has a formula as follows:
wherein ,,/>for input node in input layer->Corresponding track sequence,/->For input node->Connected fuzzy node +.>Corresponding membership function, +.>For the first target parameter, +.>For the second target parameter, the fuzzy inference layer comprises a plurality of inference nodes, and the calculation rule formula of each inference node is as follows:
the output layer comprises a plurality of output nodes, and a definition function formula of each output node is as follows:
wherein ,is a third target parameter; inputting the training data set into a constructed target fuzzy neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target fuzzy neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire the pre-trained intention interpretation model;
the game countermeasure unit is used for inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion;
The target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
8. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory in communication over the bus when the electronic device is running, said processor executing said machine readable instructions to perform the steps of the unmanned aerial vehicle game countermeasure policy generation method of any of claims 1 to 6.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the unmanned aerial vehicle game countermeasure policy generation method of any of claims 1 to 6.
CN202310628021.7A 2023-05-31 2023-05-31 Unmanned plane game countermeasure strategy generation method and system and electronic equipment Active CN116360503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310628021.7A CN116360503B (en) 2023-05-31 2023-05-31 Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310628021.7A CN116360503B (en) 2023-05-31 2023-05-31 Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Publications (2)

Publication Number Publication Date
CN116360503A CN116360503A (en) 2023-06-30
CN116360503B true CN116360503B (en) 2023-10-13

Family

ID=86922516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310628021.7A Active CN116360503B (en) 2023-05-31 2023-05-31 Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Country Status (1)

Country Link
CN (1) CN116360503B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150738B (en) * 2023-08-10 2024-05-10 中国船舶集团有限公司第七〇九研究所 Action direction pre-judging method under complex scene
CN116842127B (en) * 2023-08-31 2023-12-05 中国人民解放军海军航空大学 Self-adaptive auxiliary decision-making intelligent method and system based on multi-source dynamic data
CN117808596A (en) * 2024-01-26 2024-04-02 国金证券股份有限公司 Large language model multi-Agent system based on long-short-term memory module in securities and futures industry
CN117788164A (en) * 2024-01-26 2024-03-29 国金证券股份有限公司 Multi-Agent cooperative control algorithm and system for large language model in securities and futures industry

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269396A (en) * 2020-10-14 2021-01-26 北京航空航天大学 Unmanned aerial vehicle cluster cooperative confrontation control method for eagle pigeon-imitated intelligent game
CN112947541A (en) * 2021-01-15 2021-06-11 南京航空航天大学 Unmanned aerial vehicle intention track prediction method based on deep reinforcement learning
CN114063644A (en) * 2021-11-09 2022-02-18 北京航空航天大学 Unmanned combat aircraft air combat autonomous decision method based on pigeon flock reverse confrontation learning
CN114492749A (en) * 2022-01-24 2022-05-13 中国电子科技集团公司第五十四研究所 Time-limited red-blue countermeasure problem-oriented game decision method with action space decoupling function

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3063554B1 (en) * 2017-03-03 2021-04-02 Mbda France METHOD AND DEVICE FOR PREDICTING OPTIMAL ATTACK AND DEFENSE SOLUTIONS IN A MILITARY CONFLICT SCENARIO
CN113095481B (en) * 2021-04-03 2024-02-02 西北工业大学 Air combat maneuver method based on parallel self-game

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269396A (en) * 2020-10-14 2021-01-26 北京航空航天大学 Unmanned aerial vehicle cluster cooperative confrontation control method for eagle pigeon-imitated intelligent game
CN112947541A (en) * 2021-01-15 2021-06-11 南京航空航天大学 Unmanned aerial vehicle intention track prediction method based on deep reinforcement learning
CN114063644A (en) * 2021-11-09 2022-02-18 北京航空航天大学 Unmanned combat aircraft air combat autonomous decision method based on pigeon flock reverse confrontation learning
CN114492749A (en) * 2022-01-24 2022-05-13 中国电子科技集团公司第五十四研究所 Time-limited red-blue countermeasure problem-oriented game decision method with action space decoupling function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
无人机多机协同对抗决策研究;邵将;徐扬;罗德林;;信息与控制;第47卷(第3期);第347-354页 *

Also Published As

Publication number Publication date
CN116360503A (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN116360503B (en) Unmanned plane game countermeasure strategy generation method and system and electronic equipment
Foerster et al. Stabilising experience replay for deep multi-agent reinforcement learning
Hu et al. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat
Shantia et al. Connectionist reinforcement learning for intelligent unit micro management in starcraft
CN114115285B (en) Multi-agent emotion target path planning method and device
Feng et al. Towards human-like social multi-agents with memetic automaton
CN116661503B (en) Cluster track automatic planning method based on multi-agent safety reinforcement learning
Cao et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory
Yuan et al. Research on UCAV maneuvering decision method based on heuristic reinforcement learning
CN113139331A (en) Air-to-air missile situation perception and decision method based on Bayesian network
Kong et al. Hierarchical multi‐agent reinforcement learning for multi‐aircraft close‐range air combat
CN112651486A (en) Method for improving convergence rate of MADDPG algorithm and application thereof
CN116136945A (en) Unmanned aerial vehicle cluster countermeasure game simulation method based on anti-facts base line
Feng et al. Multifunctional radar cognitive jamming decision based on dueling double deep Q-network
Xianyong et al. Research on maneuvering decision algorithm based on improved deep deterministic policy gradient
CN117313561B (en) Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method
CN118416460A (en) Interpretable chess prediction method and system based on heterogeneous graph neural network
Liu et al. Evolutionary algorithm-based attack strategy with swarm robots in denied environments
Zhang et al. Situational continuity-based air combat autonomous maneuvering decision-making
Yang et al. WISDOM-II: A network centric model for warfare
Hou et al. Advances in memetic automaton: Toward human-like autonomous agents in complex multi-agent learning problems
Zhao et al. Deep Reinforcement Learning‐Based Air Defense Decision‐Making Using Potential Games
CN114565261B (en) GMQN-based collaborative combat control method, system, equipment and medium
CN115310257B (en) Situation estimation method and device based on artificial potential field
CN114757092A (en) System and method for training multi-agent cooperative communication strategy based on teammate perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant