CN116360503A - Unmanned plane game countermeasure strategy generation method and system and electronic equipment - Google Patents

Unmanned plane game countermeasure strategy generation method and system and electronic equipment Download PDF

Info

Publication number
CN116360503A
CN116360503A CN202310628021.7A CN202310628021A CN116360503A CN 116360503 A CN116360503 A CN 116360503A CN 202310628021 A CN202310628021 A CN 202310628021A CN 116360503 A CN116360503 A CN 116360503A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
red
target
blue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310628021.7A
Other languages
Chinese (zh)
Other versions
CN116360503B (en
Inventor
刘昊
吕金虎
王新迪
高庆
刘德元
钟森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Academy of Mathematics and Systems Science of CAS
Original Assignee
Beihang University
Academy of Mathematics and Systems Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University, Academy of Mathematics and Systems Science of CAS filed Critical Beihang University
Priority to CN202310628021.7A priority Critical patent/CN116360503B/en
Publication of CN116360503A publication Critical patent/CN116360503A/en
Application granted granted Critical
Publication of CN116360503B publication Critical patent/CN116360503B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • G05D1/106Change initiated in response to external conditions, e.g. avoidance of elevated terrain or of no-fly zones
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a method, a system and electronic equipment for generating a game countermeasure strategy of an unmanned aerial vehicle, and relates to the technical field of aircraft control, wherein the method comprises the steps of inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; the method comprises the steps of inputting a cluster state of a blue-side unmanned aerial vehicle, relative position information between the red-side unmanned aerial vehicle and a target blue-side unmanned aerial vehicle at the current moment, relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and a motion state of the red-side unmanned aerial vehicle at the current moment into a cluster average field random game model trained in advance so as to output a preferred motion of the red-side unmanned aerial vehicle, and controlling the red-side unmanned aerial vehicle to move according to the determined preferred motion, so that the accuracy of unmanned aerial vehicle game defense strategy generation is improved.

Description

Unmanned plane game countermeasure strategy generation method and system and electronic equipment
Technical Field
The application relates to the technical field of aircraft control, in particular to a method, a system and electronic equipment for generating a game countermeasure strategy of an unmanned aerial vehicle.
Background
The unmanned aerial vehicle game defense strategy autonomous generation technology refers to a technology for autonomously generating a game strategy based on battlefield situation and perceived information of both parties of a enemy in an operational environment by an unmanned aerial vehicle cluster so as to realize the aims of resisting the operational intention of the enemy, protecting the ground targets of the enemy and achieving the operational purpose of the enemy. In the prior art, the existing policy generation method has lower decision accuracy when the enemy unmanned aerial vehicle cluster has deception and false action scenes, so that a policy generation algorithm with higher decision accuracy is needed.
Disclosure of Invention
In view of this, the present application aims to provide a method, a system and an electronic device for generating an unmanned aerial vehicle game countermeasure policy, so as to improve the accuracy of generating the unmanned aerial vehicle game defense policy.
In a first aspect, the present application provides a method for generating a game countermeasure policy for an unmanned aerial vehicle, where the method includes: for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes: acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle; inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
Preferably, the clustered average field random game model outputs the preferred actions for each red-square drone by: determining the action space of the red unmanned aerial vehicle cluster from a game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster; determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment; the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment are taken as independent variables, the execution action of the red unmanned aerial vehicle is taken as a dependent variable, and the execution action meeting Nash equilibrium conditions in the action space of the red unmanned aerial vehicle cluster is solved to be taken as the preferable action of the red unmanned aerial vehicle.
Preferably, the state of the blue unmanned aerial vehicle cluster at least comprises a formation, a grouping and a combat mode, and the step of determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster specifically comprises the following steps: according to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from a game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute the action according to time sequence arrangement; and determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.
Preferably, the relative position information includes a line of sight angle between two unmanned aerial vehicles, an entry angle between a speed vector of a target unmanned aerial vehicle and the line of sight, an included angle between speeds of the two unmanned aerial vehicles, a distance between the two unmanned aerial vehicles, and a relative speed between the two unmanned aerial vehicles, and the markov transition probability distribution of the red unmanned aerial vehicle is determined by:
the total potential field energy of the red unmanned aerial vehicle is determined by the following formula
Figure SMS_1
Figure SMS_2
Figure SMS_3
Figure SMS_4
wherein
Figure SMS_18
For this red square unmanned aerial vehicle +.>
Figure SMS_7
And other red unmanned aerial vehicle->
Figure SMS_10
The angle between them cooperates with the potential field,
Figure SMS_8
for this red square unmanned aerial vehicle +.>
Figure SMS_11
And other red unmanned aerial vehicle->
Figure SMS_14
The distance between them cooperates with the potential field->
Figure SMS_17
For this red square unmanned aerial vehicle +.>
Figure SMS_13
And other red unmanned aerial vehicle->
Figure SMS_16
A speed synergy potential field between->
Figure SMS_5
For this red square unmanned aerial vehicle +.>
Figure SMS_9
Unmanned plane with target blue party>
Figure SMS_19
An angular potential field between +.>
Figure SMS_21
For this red square unmanned aerial vehicle +.>
Figure SMS_20
Unmanned plane with target blue party>
Figure SMS_22
Distance between the power fields, +.>
Figure SMS_6
For this red square unmanned aerial vehicle +.>
Figure SMS_12
Unmanned plane with target blue party>
Figure SMS_15
A velocity potential field therebetween;
determining the Markov transition probability distribution of the red unmanned aerial vehicle through the following formula
Figure SMS_23
Figure SMS_24
wherein ,
Figure SMS_25
for this red square unmanned aerial vehicle +.>
Figure SMS_26
Current time->
Figure SMS_27
Is in the state of motion->
Figure SMS_28
For this red square unmanned aerial vehicle +. >
Figure SMS_29
Current time->
Figure SMS_30
Is performed by the processor.
Preferably, the step of solving the execution action meeting the nash equilibrium condition in the action space as the preferred action of the red unmanned aerial vehicle by taking the potential field probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables and the execution action of the red unmanned aerial vehicle as the dependent variables specifically comprises:
determining the execution action meeting the Nash equilibrium condition as the preferable action of the red unmanned aerial vehicle through the following formula:
Figure SMS_31
Figure SMS_32
wherein ,
Figure SMS_33
for this red square unmanned aerial vehicle +.>
Figure SMS_34
Is>
Figure SMS_35
For this red square unmanned aerial vehicle +.>
Figure SMS_36
Is the preferred action of (a), discount rate
Figure SMS_37
Preferably, the method further comprises the step of storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle in a game countermeasure mechanism library for optimizing and updating an intention interpretation model and a cluster average field random game model.
Preferably, the intent interpretation model is generated by training in the following way: acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a plurality of sample blue unmanned aerial vehicle track sequences and corresponding blue unmanned aerial vehicle cluster states; the method comprises the steps of constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer, the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical quantity of combat modes, each fuzzy node corresponds to a different membership function, and the membership function has a formula as follows:
Figure SMS_38
wherein
Figure SMS_40
,/>
Figure SMS_42
For input node in input layer->
Figure SMS_44
Corresponding track sequence,/->
Figure SMS_39
For input node->
Figure SMS_43
Connected fuzzy node +.>
Figure SMS_45
Corresponding membership function, +.>
Figure SMS_46
For the first target parameter, +.>
Figure SMS_41
For the second target parameter, the fuzzy inference layer comprises a plurality of inference nodes, and the calculation rule formula of each inference node is as follows:
Figure SMS_47
the output layer comprises a plurality of output nodes, and the definition function formula of each output node is as follows:
Figure SMS_48
wherein ,
Figure SMS_49
is a third target parameter; and inputting the training data set into the constructed target model neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target model neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire a pre-trained intention interpretation model.
In a second aspect, the present application provides an unmanned aerial vehicle game countermeasure policy generation system, the system comprising: the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and comprises: the track prediction unit is used for acquiring a history track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, inputting a pre-trained track prediction model, and outputting a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; the attack target prediction unit is used for inputting track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue unmanned aerial vehicle; the intention interpretation unit is used for inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; the game countermeasure unit is used for inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
In a third aspect, the present application further provides an electronic device, including: the system comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device is running, and the machine-readable instructions are executed by the processor to perform the steps of generating the game countermeasure strategy of the unmanned plane.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a drone game countermeasure policy generation method as described above.
The method comprises the steps of determining the execution action of each red unmanned aerial vehicle at the next moment according to each red unmanned aerial vehicle, and controlling the red unmanned aerial vehicle to move according to the determined execution action: acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle; inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle; inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state; inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range, behaviors and intentions of the blue unmanned aerial vehicle are analyzed through deep learning before strategy generation, game countermeasure strategies of the red unmanned aerial vehicle are dynamically generated and adjusted, and timeliness of accuracy of decisions is improved.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for generating a game countermeasure policy for an unmanned aerial vehicle according to an embodiment of the present application;
FIG. 2 is a flowchart of steps for determining a preferred action provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of a scenario for generating a game countermeasure policy according to an embodiment of the present application;
fig. 4 is a schematic diagram of a first drone game countermeasure simulation provided in an embodiment of the present application;
fig. 5 is a schematic diagram of a game challenge simulation of a second unmanned aerial vehicle according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a change in the number of unmanned aerial vehicles according to an embodiment of the present application;
Fig. 7 is a block diagram of an unmanned plane game countermeasure policy generation system according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, every other embodiment that a person skilled in the art would obtain without making any inventive effort is within the scope of protection of the present application.
First, application scenarios applicable to the present application will be described. The method and the device can be applied to game countermeasure strategy generation of unmanned aerial vehicle cooperative combat, and are particularly suitable for control strategy generation of the rotor unmanned aerial vehicle.
The research shows that the unmanned aerial vehicle game defense strategy autonomous generation technology refers to a technology for automatically generating a game strategy based on battlefield situation and perceived information of both sides of a friend or foe in an operational environment by an unmanned aerial vehicle cluster so as to realize the aims of resisting the intention of the foe to combat, protecting the ground targets of the friend or foe and achieving the aim of the foe to combat. In the prior art, lei et al, for example, put forward an optimal strategy based on complete information and Markov, so as to realize attack and defense for a moving target; carter et al considers the dynamic conversion of game models under different attack scenarios, thereby proposing a strategy generation algorithm; the Garcia et al models the unmanned aerial vehicle cluster game problem as a differential game problem of cluster hitting, and gives out the unmanned aerial vehicle cluster anti-hitting guidance law under the scene by establishing a whole process performance function and giving out a hitting capability evaluation function.
Based on the above, the embodiment of the application provides a method, a system and electronic equipment for generating an unmanned aerial vehicle game countermeasure strategy, so as to improve the accuracy of unmanned aerial vehicle game defending strategy generation.
Referring to fig. 1 and fig. 3, fig. 1 is a flowchart of a method for generating a game countermeasure policy for an unmanned aerial vehicle according to an embodiment of the present application, and fig. 3 is a schematic diagram of a scenario for generating a game countermeasure policy according to an embodiment of the present application. As shown in fig. 1, the method for generating the game countermeasure policy of the unmanned aerial vehicle provided in the embodiment of the present application includes:
for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes:
here, in one embodiment of the present application, the execution actions of the red unmanned aerial vehicle are dynamically transformed, that is, the execution actions of each red unmanned aerial vehicle at the next moment are determined based on the historical data collected by the red unmanned aerial vehicle. The historical data here includes, but is not limited to, the status and scene information of the blue unmanned aerial vehicle collected by the red unmanned aerial vehicle over a period of time. Actions herein include, but are not limited to, speed, pose, whether to attack, etc. of the drone.
S101, acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle.
In step S101, the trajectory prediction model is trained and generated based on an LSTM (Long-short time memory nerve networks) model. The track prediction model is input into a track sequence of each target blue unmanned aerial vehicle, the target blue unmanned aerial vehicle refers to any blue unmanned aerial vehicle monitored in the sensing range of the red unmanned aerial vehicle, and the history track sequence can be continuous or discrete, such as continuous three-dimensional position coordinates of the target blue unmanned aerial vehicle in the past 10 seconds, or three-dimensional position coordinates corresponding to the 8 th second, the 5 th second and the 3 rd second of the target blue unmanned aerial vehicle. The output of the track prediction model is a predicted track sequence corresponding to each input target blue unmanned aerial vehicle, for example, three-dimensional position coordinates within 5 seconds of the target blue unmanned aerial vehicle or three-dimensional position coordinates within 3 seconds of the target blue unmanned aerial vehicle, and the shorter the length of the predicted track sequence, the more beneficial to the prediction in the step S102.
S102, inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle.
The attack target prediction model is trained and generated by adopting an IDCNN (ItatedDilatedConvolvulation NeuralNetworks) model. The input of the attack target prediction model is a track prediction result of each target blue unmanned aerial vehicle output by the track prediction model. The output of the attack target prediction model is the attack target prediction result of each target blue unmanned aerial vehicle, for example, the attack target of the target blue unmanned aerial vehicle A is the red unmanned aerial vehicle B, the attack target of the target blue unmanned aerial vehicle C is the red ground target D, and the like. The attack target at least comprises a red unmanned aerial vehicle, a red ground target or an unmanned aerial vehicle.
S103, inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state.
The intent interpretation model is generated by training with a fuzzy neural network model. The input of the intention interpretation model is a behavior sequence of each target blue unmanned aerial vehicle determined by the output results of the track prediction model and the attack target prediction model. The output of the intent interpretation model is the cluster state of the blue unmanned aerial vehicle, wherein the cluster state at least comprises information such as formation, grouping, combat mode and the like, for example, formation of a first formation grouping to hit the ground target on my side, formation of a second formation cluster to detect the information on my side, formation of a third formation cluster to interfere with the unmanned aerial vehicle B on my side and the like.
S104, inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a cluster average field random game model trained in advance so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion.
The target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
The cluster mean field random game model is generated by training based on reinforcement learning by an expert knowledge-aided mean field cluster game strategy generation algorithm. Inputs of the cluster average field random game model comprise enemy situation (relative geometric information between two machines), enemy unmanned aerial vehicle cluster state and the like. The output of the cluster average field random game model is the preferred action to be executed by the red unmanned aerial vehicle at the next moment.
As shown in fig. 2, fig. 2 is a flowchart illustrating steps for determining a preferred action according to an embodiment of the present application. Specifically, the clustered average field random game model outputs the preferred actions of each red-square drone by:
S201, determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster.
The blue-side unmanned aerial vehicle cluster state at least comprises a formation, a grouping and a combat mode, and the step of determining the action space of the red-side unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the blue-side unmanned aerial vehicle cluster state specifically comprises the following steps:
according to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from a game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute actions according to time sequence arrangement. And determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.
The game countermeasure library may be pre-established, and the game countermeasure mechanism library includes a plurality of unmanned aerial vehicle cluster countermeasure schemes, where the unmanned aerial vehicle cluster countermeasure schemes are used to instruct each red unmanned aerial vehicle in the cluster to execute actions according to time sequence, for example, move to enemies according to a predetermined patrol formation, catch attack enemies according to 2V1, and so on.
The unmanned cluster countermeasure scheme herein may be expressed as
Figure SMS_50
Wherein
Figure SMS_51
, wherein />
Figure SMS_52
For indicating +.>
Figure SMS_53
And (5) the red unmanned aerial vehicle.The action space here can be expressed as +.>
Figure SMS_54
At any decision moment, unmanned plane +.>
Figure SMS_55
Action to be taken->
Figure SMS_56
Not only to itself but also to the whole cluster.
S202, determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment.
The relative position information comprises a sight angle between two unmanned aerial vehicles, an entry angle between a speed vector of a target unmanned aerial vehicle and the sight, an included angle between speeds of the two unmanned aerial vehicles, a distance between the two unmanned aerial vehicles and a relative speed between the two unmanned aerial vehicles, and the Markov transition probability distribution of the red unmanned aerial vehicle is determined in the following mode:
the total potential field energy of the red unmanned aerial vehicle is determined by the following formula
Figure SMS_57
Figure SMS_58
Figure SMS_59
Figure SMS_60
wherein
Figure SMS_71
For this red square unmanned aerial vehicle +.>
Figure SMS_63
And other red unmanned aerial vehicle->
Figure SMS_67
The angle between them cooperates with the potential field,
Figure SMS_72
for this red square unmanned aerial vehicle +.>
Figure SMS_76
And other red unmanned aerial vehicle->
Figure SMS_74
The distance between them cooperates with the potential field- >
Figure SMS_77
For this red square unmanned aerial vehicle +.>
Figure SMS_75
And other red unmanned aerial vehicle->
Figure SMS_78
A speed synergy potential field between->
Figure SMS_64
For this red square unmanned aerial vehicle +.>
Figure SMS_68
Unmanned plane with target blue party>
Figure SMS_65
An angular potential field between +.>
Figure SMS_69
For this red square unmanned aerial vehicle +.>
Figure SMS_70
Unmanned plane with target blue party>
Figure SMS_73
Distance between the power fields, +.>
Figure SMS_61
For this red square unmanned aerial vehicle +.>
Figure SMS_66
Unmanned plane with target blue party>
Figure SMS_62
A velocity potential field therebetween;
determining the Markov transition probability distribution of the red unmanned aerial vehicle through the following formula
Figure SMS_79
Figure SMS_80
wherein ,
Figure SMS_81
for this red square unmanned aerial vehicle +.>
Figure SMS_82
Current time->
Figure SMS_83
Is in the state of motion->
Figure SMS_84
For this red square unmanned aerial vehicle +.>
Figure SMS_85
Current time->
Figure SMS_86
Is performed by the processor.
An average field cluster-based gaming strategy generation algorithm, which may be expert knowledge aided here, is based on reinforcement learning. Unmanned plane
Figure SMS_87
The acquired perception information is mainly relative geometric information between two machines, and can be expressed as
Figure SMS_88
Respectively represent unmanned plane->
Figure SMS_89
Line of sight angle between speed vector and two machine lines of sight, unmanned aerial vehicle +.>
Figure SMS_90
Target entry angle between velocity vector and line of sight, distance between two machines, angle between speeds of two machines, and relative speed of two machines.
Consider
Figure SMS_92
Clusters of individual unmanned aerial vehicles, record +.>
Figure SMS_95
Nearest to the individual unmanned plane->
Figure SMS_97
The personal unmanned plane is gathered as +. >
Figure SMS_91
Relative to which is nearest +.>
Figure SMS_99
The personal enemy unmanned aerial vehicle set is +.>
Figure SMS_101
. Introducing potential field concept, defining My +.>
Figure SMS_103
Unmanned aerial vehicle receives in my cluster +.>
Figure SMS_94
The radiated potential field is a synergistic potential field +.>
Figure SMS_96
Figure SMS_100
Respectively, angular uniformity, distance uniformity and speed uniformity. At the same time, defineMy->
Figure SMS_104
Unmanned aerial vehicle receives in enemy cluster +.>
Figure SMS_93
The radiated potential field is the power field +.>
Figure SMS_98
Figure SMS_102
Respectively representing enemy angle power, distance power and speed power.
The step establishes interaction among clustered individuals, and can calculate the total potential field energy and the quantitative situation of the clustered individuals according to the relative position relation with the neighborhood individuals, so as to adopt maneuvering strategies.
S203, taking the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, taking the execution action of the red unmanned aerial vehicle as the dependent variables, and solving the execution action meeting the Nash equilibrium condition in the action space as the preferable action of the red unmanned aerial vehicle.
The method specifically comprises the steps of taking potential field probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, taking the execution action of the red unmanned aerial vehicle as the dependent variables, and solving the execution action meeting Nash equilibrium conditions in an action space as the preferable action of the red unmanned aerial vehicle, wherein the steps comprise the following steps:
Determining the execution action meeting the Nash equilibrium condition as the preferable action of the red unmanned aerial vehicle through the following formula:
Figure SMS_105
Figure SMS_106
wherein ,
Figure SMS_107
for this red square unmanned aerial vehicle +.>
Figure SMS_108
Is>
Figure SMS_109
For this red square unmanned aerial vehicle +.>
Figure SMS_110
Is the preferred action of (a), discount rate
Figure SMS_111
Defined herein
Figure SMS_112
For the overall impact on the individual, the Markov transition probability for the individual state is defined as follows:
Figure SMS_113
wherein ,
Figure SMS_114
is the current unmanned plane state. For->
Figure SMS_115
And discount rate->
Figure SMS_116
Unmanned aerial vehicle->
Figure SMS_117
The objective of (a) is to maximize the desired cumulative discount rewards function, i.e
Figure SMS_118
The process establishes an average field random game model of the clusters, rewards obtained by the clusters are influenced by state probability distribution of other unmanned aerial vehicles, and the clusters are mutually influenced through state and income functions.
For containing
Figure SMS_119
Cluster system of unmanned aerial vehicles, given +.>
Figure SMS_120
Status of personal unmanned plane->
Figure SMS_121
Probability distribution resulting from sum potential field
Figure SMS_122
Determining an optimal strategy->
Figure SMS_123
To satisfy the Nash equilibrium condition:
Figure SMS_124
the formula defining the value function is:
Figure SMS_125
the bellman formula can be obtained according to the dynamic programming principle as follows:
Figure SMS_126
Figure SMS_127
when the method is used for solving the problem, the state transition probability needs to be obtained, meanwhile, the design of rewarding and punishing functions is needed in the training process, and quantized data is obtained through conversion of expert knowledge. Solving the problems can obtain the optimal sequential decision of the single unmanned aerial vehicle, and the whole cluster can emerge the cluster behavior such as autonomous formation, grouping number, killing mode and the like by considering the whole cluster.
According to the unmanned aerial vehicle game countermeasure strategy generation method, the behavior and the intention of the blue unmanned aerial vehicle are analyzed through deep learning before strategy generation, so that the game countermeasure strategy of the red unmanned aerial vehicle is dynamically generated and adjusted, and the timeliness of the accuracy of decisions is improved.
In one embodiment of the present application, the trajectory prediction model and the attack target prediction model are generated by training the following processes:
aiming at the dynamics of the enemy clusters, the space situation prediction modeling inversion and the on-line optimization based on the neural network are developed, and the dynamic prediction of the enemy cluster behavior characteristics is realized.
Aiming at the problems of rapid change of battlefield situation, difficult prediction in game countermeasure process, multiple constraint of game model and the like, in the embodiment, a neural network model is designed based on a long-short-time memory neural network to predict the attack path of an enemy cluster, and then a classifier for predicting the attack target is designed based on an LSTM and an iterative hole convolution neural network. The target track prediction based on deep learning does not need to model the target when in use, and overcomes the defect that the acceleration change is difficult to predict due to unknown target aerodynamic parameters in the traditional algorithm.
Specifically, firstly, extracting time sequence characteristics of a blue unmanned aerial vehicle track by an LSTM layer, and then searching local characteristics by an IDCNN layer, thereby completing classification of targets. For track prediction, the LSTM has the characteristics of high calculation speed and good real-time performance, and the LSTM is used for design independently. And collecting flight tracks of the enemy aircraft within a period of time, and constructing a track database for training and generating a track prediction model.
In LSTM, first, the reserved data of the last moment is determined, and this part is composed of a forgetting gate, where the forgetting gate function expression is:
Figure SMS_128
wherein ,
Figure SMS_129
respectively represent the output value and the time of moment on the LSTM networkInput value of previous time,/->
Figure SMS_130
Representing an activation function->
Figure SMS_131
Representing forgetting door weight,/->
Figure SMS_132
Indicating forgetting door bias->
Figure SMS_133
The vector is output for the forget gate.
The forgetting gate decides the transmission condition of the state at the last moment by converting the state between the input and the last moment into a value between 0 and 1 through an activation function, and then the input gate decides the update information of the unit state, and the input gate function expression is as follows:
Figure SMS_134
Figure SMS_135
wherein ,
Figure SMS_136
respectively representing the weight and bias of the input gate, +.>
Figure SMS_137
Weight and bias respectively representing extraction of effective information, +.>
Figure SMS_138
Current value to be updated representing the state of the cell, +. >
Figure SMS_139
For inputting gate weight for controlling +.>
Figure SMS_140
Which features are eventually updated to the cell state +.>
Figure SMS_141
In, cell state->
Figure SMS_142
The functional expression is:
Figure SMS_143
finally, outputting a final result by an output gate, wherein the output gate function expression is as follows:
Figure SMS_144
Figure SMS_145
wherein ,
Figure SMS_146
the second formula calculates the output weight of the output gate, namely the state of the unit at the current moment, the weight and the bias of the output gate>
Figure SMS_147
For LSTM final output, it is determined by the output gate weight and cell state. Through the three gating units and unit state transfer, the LSTM can process the time sequence problem, and can be further used for scenes such as time sequence track prediction and the like.
For the attack target prediction model, firstly, a trained track prediction model is input according to the current available partial track data and expert data so as to obtain a plurality of track prediction results, the track prediction results are used as the input of a classifier, and the attack target is predicted through the classifier.
In one embodiment of the application, aiming at the characteristics of intent resistance, high dynamic property, deception and the like of the adversary clusters, based on the acquired situation information and expert knowledge base, a fuzzy neural network and an inverse reinforcement learning framework are adopted to develop interpretation and modeling analysis of the adversary intent under the cluster game resistance condition. Constructing a tactical intention model based on a fuzzy neural network, forming an countermeasure sample by using the target attribute of the enemy and the corresponding tactical intention to train, and combining the neural network formed by different source data to realize intelligent reasoning of the enemy fight intention, and deducing the enemy cluster state according to the basic action of the intelligent reasoning, thereby improving the accuracy and the rapidity of interpretation; based on the inverse reinforcement learning framework, the intention balance solution analysis is given, the confidence of the prediction trend is further evaluated, and the interpretability is further improved. Planning generation supporting my gaming strategy. The intent interpretation model is here trained to be generated by:
And acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a track sequence of a plurality of sample blue unmanned aerial vehicles and a corresponding blue unmanned aerial vehicle cluster state.
The fuzzy neural network system can be built based on the fuzzy system and fused with the neural network, and is converted into the self-adaptive network to realize the learning process of the T-S fuzzy type.
And constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer.
Firstly, constructing an input layer, and recording an input vector corresponding to each node input network of the input layer as
Figure SMS_148
Then, a fuzzy layer is constructed, wherein the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical number of the combat mode, each fuzzy node corresponds to a different membership function, and the membership function has the formula:
Figure SMS_149
wherein
Figure SMS_151
,/>
Figure SMS_154
For input node in input layer->
Figure SMS_155
Corresponding track sequence,/->
Figure SMS_152
For input node->
Figure SMS_153
Connected fuzzy node +.>
Figure SMS_156
Corresponding membership function, +.>
Figure SMS_157
For the first target parameter, +.>
Figure SMS_150
As a second target parameter, the first target parameter,
each input node corresponds to
Figure SMS_158
Fuzzy nodes, called->
Figure SMS_159
Is used +.>
Figure SMS_160
And (3) representing. Each fuzzy node corresponds to a respective membership function +. >
Figure SMS_161
Which can be expressed as +.>
Figure SMS_162
A fuzzy inference layer is then built, each inference node representing a rule (i.e., intent of enemy party such as interference, impact, killing, etc.) for computing the fitness of the rule.
Specifically, the fuzzy inference layer includes a plurality of inference nodes, and the calculation rule formula of each inference node is:
Figure SMS_163
and finally, building a definition layer, and clearing and outputting the data of the fuzzy reasoning layer.
The output layer comprises a plurality of output nodes, and the definition function formula of each output node is as follows:
Figure SMS_164
wherein ,
Figure SMS_165
is a third target parameter; and inputting the training data set into the constructed target model neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target model neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire a pre-trained intention interpretation model.
Specifically, in the training stage, the number of nodes at each layer needs to be predetermined first, the fuzzy inference layer is given through a specific operation rule, and the parameters to be learned are mainly. Then, given the target attribute obtained after the sensor data fusion and the corresponding tactical intent countermeasure sample, the comprehensive calculation target behavior intent is obtained from different data sources and is converted into a training data set. After finishing data preparation, entering a parameter training stage, carrying out training learning of parameters through back propagation or a mixed algorithm of back propagation and a least square method, and adjusting parameters of a system; in the hybrid algorithm, the least square estimation is used for identifying weight parameters when the forward stage calculates to the definition layer, the error signal in the reverse stage is reversely transmitted, and the membership function parameters are updated by using the reverse propagation. The adoption of the mixing method can reduce the search space scale of the back propagation method, thereby improving the training speed of the fuzzy neural network.
In one embodiment of the present application, the method further includes storing the acquired cluster states and corresponding scene information corresponding to the two unmanned aerial vehicles in a game countermeasure mechanism library for optimization and update of the intent interpretation model and the cluster average field random game model.
And constructing simulation scenes according to different cluster combat task requirements, and constructing an expert system strategy generator. On the premise that the enemy clusters randomly select game strategies, the enemy clusters collect battlefield situation information, an expert system is utilized to generate game countermeasure strategies, a game countermeasure mechanism library is constructed, strategy selection and a battlefield evolution process are stored, and data support is provided for subsequent game algorithms and strategy model design.
In one embodiment of the present application, a simulation instance is provided that applies unmanned game countermeasure policy generation to a typical scenario.
The specific scene settings are as follows: enemy clusters: 10 unmanned aerial vehicles hit the ground target area of 2 places of the my according to a fixed strategy at a certain height outside 1500 m; my cluster: 20 unmanned aerial vehicles are in a spiral standby flight state at a preset position (500 m from a target area and a certain height). The interval between the ground target areas at the position 2 of the my department is 100m, and the area of each ground target area is 10
Figure SMS_166
The parameters are set as follows: my/enemy drone field of view distance: 150m,100m; maximum speed of my/enemy drone: 5
Figure SMS_167
,4/>
Figure SMS_168
The other unmanned aerial vehicle dynamic parameters are set according to typical parameters.
In the initialization stage, firstly, a game countermeasure scene is established, and then, a cluster dynamics nonlinear mathematical model is established for participating in a game countermeasure simulation process. And executing an initialization strategy by the clusters of the two parties after loading the dynamic model, updating the detection state in real time, and entering a game countermeasure link once the cluster individuals detect the other party cluster individuals.
In the game countermeasure link, an enemy cluster strategy library is fixed and is allocated in an initialization stage group, and after an individual of the enemy cluster is detected, the enemy cluster wakes up tasks in the group and executes the tasks. Unlike the enemy cluster policy, the enemy cluster policy is variable: firstly, a cluster behavior and track prediction module is utilized to realize behavior prediction according to enemy cluster situation; on the basis of obtaining a prediction result, intent interpretation is carried out on the enemy clusters, and according to the situation of the enemy clusters, expert knowledge is utilized to assist intelligent learning game strategies, and dominant strategies are selected from a strategy library to be game-played with the enemy clusters.
Fig. 4 is a schematic diagram of a first drone game countermeasure simulation provided in an embodiment of the present application. Fig. 5 is a schematic diagram of a game challenge simulation of a second unmanned aerial vehicle according to an embodiment of the present application. Fig. 6 is a schematic diagram of the number change of unmanned aerial vehicles according to an embodiment of the present application. The gaming stage is shown in fig. 4 and 5. Triangles represent my clusters and circles represent enemy clusters. In the beginning, the my clusters move towards the enemy according to the preset patrol formation, and the enemy clusters form a certain formation group to hit the ground target (see fig. 4). In the game stage, the enemy clusters first find enemy clusters in the view field, transfer enemy information in the clusters, then realize enemy cluster track prediction according to the LSTM network, and crack the opponent intention based on the track prediction result, and deduce that the enemy cluster fight intention is to attack the enemy ground target according to a fixed route in the scene simulation (see figure 5). Based on the pre-trained expert knowledge auxiliary game technology, the cost of the 2V1 capturing attack mode is minimum according to the derivation of the mechanism library by the My cluster, and an attack formation is formed autonomously. During the striking process, the game strategy of the my is dynamically changed, if the target is eliminated, the unmanned aerial vehicle automatically matches the next attack target according to the war situation (in fig. 5, the dark triangle represents the unmanned aerial vehicle of the my enters the interception state, the light triangle represents the unmanned aerial vehicle of the my is still in the standby state, the large black circle represents the unmanned aerial vehicle of the enemy has been locked for attack, otherwise, the small light circle represents the unmanned aerial vehicle of the enemy).
To further illustrate the effectiveness of the game strategy, as shown in fig. 6, the two parties of the friend and foe start to strike at about 120s, the friend and foe clusters are all destroyed at about 180s, 10 frames remain at the moment, and the best striking and the minimum cost are realized on the premise of ensuring the safety of ground targets.
Aiming at the problems of low decision accuracy and the like caused by the hostile behavior deception in the process of the game countering of the unmanned aerial vehicle cluster, the unmanned aerial vehicle game countering strategy generation method provided by the embodiment of the application firstly provides a technical scheme for predicting hostile behavior and interpreting intention before strategy generation. Based on methods such as a neural network, dynamic prediction of unmanned aerial vehicle cluster behavior characteristics is realized, then a tactical intention reasoning model is constructed based on a fuzzy neural network and an inverse reinforcement learning method, intelligent reasoning of the hostile intention is realized by combining different source data training neural networks, balanced solution analysis and confidence level of the hostile intention are provided, and accuracy of decision making is improved. Aiming at the high dynamic state of battlefield environment and enemy strategies, most of the current game countermeasure strategy generation algorithms are not strong in rapidness and pertinence, a game countermeasure strategy set is generated by adopting expert knowledge assistance, a game countermeasure mechanism library is constructed by combining typical task links, experience storage is realized, an expert knowledge assistance intelligent learning game strategy algorithm is constructed, and the my strategies are dynamically adjusted to realize the game strategies of the my clusters, so that the high instantaneity and strong pertinence of a decision process are realized.
Based on the same inventive concept, the embodiment of the application also provides an unmanned plane game countermeasure policy generation system corresponding to the unmanned plane game countermeasure policy generation method, and since the principle of solving the problem by the device in the embodiment of the application is similar to that of the unmanned plane game countermeasure policy generation method in the embodiment of the application, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an unmanned plane game countermeasure policy generation system according to an embodiment of the present application. As shown in fig. 7, the unmanned plane game countermeasure policy generation system includes:
the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and comprises:
the track prediction unit 101 is configured to obtain a historical track sequence of at least one target blue unmanned aerial vehicle collected by the red unmanned aerial vehicle, and input a track prediction model trained in advance, so as to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;
the attack target prediction unit 102 is configured to input track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model, so as to output attack target prediction results of each target blue unmanned aerial vehicle;
The intention interpretation unit 103 is configured to input the track prediction results of all the target blue unmanned aerial vehicles and the attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model, so as to output a blue unmanned aerial vehicle cluster state;
the game countermeasure unit 104 is configured to input the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment, and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model, so as to output a preferred motion of the red unmanned aerial vehicle, and control the red unmanned aerial vehicle to move according to the determined preferred motion; the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 800 includes a processor 810, a memory 820, and a bus 830.
The memory 820 stores machine-readable instructions executable by the processor 810, when the electronic device 800 is running, the processor 810 communicates with the memory 820 through the bus 830, and when the machine-readable instructions are executed by the processor 810, the steps of the method for generating the game countermeasure policy by the unmanned aerial vehicle in the method embodiment shown in fig. 1 can be executed, and the specific implementation is referred to the method embodiment and will not be described herein.
The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the method for generating an unmanned plane game countermeasure policy in the method embodiment shown in fig. 1 may be executed, and a specific implementation manner may refer to the method embodiment and will not be described herein.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RandomAccessMemory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An unmanned aerial vehicle game countermeasure policy generation method, the method comprising:
for each red unmanned aerial vehicle, determining the execution action of the red unmanned aerial vehicle at the next moment, and controlling the red unmanned aerial vehicle to move according to the determined execution action by the following modes:
Acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, and inputting a pre-trained track prediction model to output a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;
inputting track prediction results of all target blue-side unmanned aerial vehicles corresponding to all red-side unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue-side unmanned aerial vehicle;
inputting track prediction results of all target blue unmanned aerial vehicles and attack target prediction results of all target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state;
inputting the cluster state of the blue-side unmanned aerial vehicle, the relative position information between the red-side unmanned aerial vehicle and the target blue-side unmanned aerial vehicle at the current moment, the relative position information between the red-side unmanned aerial vehicle and other red-side unmanned aerial vehicles at the current moment and the motion state of the red-side unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red-side unmanned aerial vehicle and control the red-side unmanned aerial vehicle to move according to the determined preferred motion;
The target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
2. The method of claim 1, wherein the clustered average field random game model outputs the preferred actions of each red drone by:
determining the action space of the red unmanned aerial vehicle cluster from a game countermeasure mechanism library according to the state of the blue unmanned aerial vehicle cluster;
determining Markov transition probability distribution of the red unmanned aerial vehicle according to the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment and the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment;
and solving the execution action meeting Nash equilibrium conditions in the action space of the red unmanned aerial vehicle cluster to serve as the preferable action of the red unmanned aerial vehicle by taking the Markov transition probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables and taking the execution action of the red unmanned aerial vehicle as dependent variables.
3. The method according to claim 2, wherein the status of the blue unmanned aerial vehicle cluster at least includes formation, grouping and combat mode, and the step of determining the action space of the red unmanned aerial vehicle cluster from the game countermeasure mechanism library according to the status of the blue unmanned aerial vehicle cluster specifically includes:
According to the cluster state of the blue unmanned aerial vehicle and the number of the red unmanned aerial vehicles, matching a corresponding unmanned aerial vehicle cluster countermeasure scheme from the game countermeasure mechanism library, wherein the game countermeasure mechanism library comprises a plurality of unmanned aerial vehicle cluster countermeasure schemes, and each unmanned aerial vehicle cluster countermeasure scheme is used for indicating each red unmanned aerial vehicle to execute the action according to time sequence arrangement;
and determining the execution actions which are arranged according to the time sequence and correspond to each red unmanned aerial vehicle according to the matched unmanned aerial vehicle cluster countermeasure scheme so as to generate an action space of the red unmanned aerial vehicle cluster.
4. The method of claim 2, wherein the relative position information comprises a line of sight angle between two unmanned aerial vehicles, an entry angle between a velocity vector of a target unmanned aerial vehicle and the line of sight, an angle between velocities of two unmanned aerial vehicles, a distance between two unmanned aerial vehicles, and a relative velocity between two unmanned aerial vehicles, and the markov transition probability distribution for the red unmanned aerial vehicle is determined by:
the total potential field energy of the red unmanned aerial vehicle is determined by the following formula
Figure QLYQS_1
Figure QLYQS_2
Figure QLYQS_3
Figure QLYQS_4
wherein
Figure QLYQS_15
For this red square unmanned aerial vehicle +.>
Figure QLYQS_7
And other red unmanned aerial vehicle->
Figure QLYQS_11
An angular synergistic potential field between->
Figure QLYQS_8
For this red square unmanned aerial vehicle +. >
Figure QLYQS_9
And other red unmanned aerial vehicle->
Figure QLYQS_13
The distance between them cooperates with the potential field->
Figure QLYQS_17
For this red square unmanned aerial vehicle +.>
Figure QLYQS_16
And other red unmanned aerial vehicle->
Figure QLYQS_20
A speed synergy potential field between->
Figure QLYQS_5
For this red square unmanned aerial vehicle +.>
Figure QLYQS_12
Unmanned plane with target blue party>
Figure QLYQS_18
An angular potential field between +.>
Figure QLYQS_21
For this red square unmanned aerial vehicle +.>
Figure QLYQS_19
Unmanned plane with target blue party>
Figure QLYQS_22
The distance between the power fields of the distances,
Figure QLYQS_6
for this red square unmanned aerial vehicle +.>
Figure QLYQS_10
Unmanned plane with target blue party>
Figure QLYQS_14
A velocity potential field therebetween;
determining the Markov transition probability distribution of the red unmanned aerial vehicle through the following formula
Figure QLYQS_23
Figure QLYQS_24
wherein ,
Figure QLYQS_25
for this red square unmanned aerial vehicle +.>
Figure QLYQS_26
Current time->
Figure QLYQS_27
Is in the state of motion->
Figure QLYQS_28
For this red square unmanned aerial vehicle +.>
Figure QLYQS_29
Current time->
Figure QLYQS_30
Is performed by the processor.
5. The method according to claim 4, wherein the step of solving the execution action satisfying the nash balance condition in the action space as the preferred action of the red unmanned aerial vehicle by taking the potential field probability distribution of the red unmanned aerial vehicle and the motion state of the red unmanned aerial vehicle at the current moment as independent variables, wherein the execution action of the red unmanned aerial vehicle is dependent variables, specifically comprises:
determining the execution action meeting the Nash equilibrium condition as the preferable action of the red unmanned aerial vehicle through the following formula:
Figure QLYQS_31
Figure QLYQS_32
wherein ,
Figure QLYQS_33
for this red square unmanned aerial vehicle +.>
Figure QLYQS_34
Is>
Figure QLYQS_35
For this red square unmanned aerial vehicle +.>
Figure QLYQS_36
Is the preferred action of (a), discount rate
Figure QLYQS_37
6. The method as recited in claim 2, further comprising:
and storing the acquired cluster states and the acquired scene information corresponding to the unmanned aerial vehicle of the red and blue parties in the game countermeasure mechanism library for optimizing and updating the intent interpretation model and the cluster average field random game model.
7. The method of claim 3, wherein the intent interpretation model is generated by training in the following manner:
acquiring a training data set, wherein the training data set comprises a plurality of groups of data samples, and each data sample comprises a plurality of sample blue unmanned aerial vehicle track sequences and corresponding blue unmanned aerial vehicle cluster states;
constructing a target fuzzy neural network model, wherein the target fuzzy neural network model comprises an input layer, a fuzzy reasoning layer and an output layer,
the fuzzy layer comprises a preset number of fuzzy nodes determined according to the statistical quantity of the combat mode, each fuzzy node corresponds to a different membership function, and the membership function has the formula:
Figure QLYQS_38
wherein
Figure QLYQS_41
,/>
Figure QLYQS_43
For input node in input layer->
Figure QLYQS_45
Corresponding track sequence,/->
Figure QLYQS_40
For input node->
Figure QLYQS_42
Connected fuzzy node +.>
Figure QLYQS_44
Corresponding membership function, +.>
Figure QLYQS_46
For the first target parameter, +.>
Figure QLYQS_39
As a second target parameter, the first target parameter,
the fuzzy reasoning layer comprises a plurality of reasoning nodes, and the calculation rule formula of each reasoning node is as follows:
Figure QLYQS_47
the output layer comprises a plurality of output nodes, and a definition function formula of each output node is as follows:
Figure QLYQS_48
wherein ,
Figure QLYQS_49
is a third target parameter;
and inputting the training data set into the constructed target model neural network model, and adjusting a first target parameter, a second target parameter and a third target parameter in the target model neural network model based on a mixed algorithm combining back propagation and a least square method so as to acquire the pre-trained intention interpretation model.
8. An unmanned aerial vehicle game countermeasure policy generation system, the system comprising:
the control module is used for determining the execution action of each red unmanned aerial vehicle at the next moment and controlling the red unmanned aerial vehicle to move according to the determined execution action, and the control module comprises:
the track prediction unit is used for acquiring a historical track sequence of at least one target blue unmanned aerial vehicle acquired by the red unmanned aerial vehicle, inputting a pre-trained track prediction model, and outputting a track prediction result of each target blue unmanned aerial vehicle corresponding to the red unmanned aerial vehicle;
The attack target prediction unit is used for inputting track prediction results of all target blue unmanned aerial vehicles corresponding to all red unmanned aerial vehicles at the current moment into a pre-trained attack target prediction model so as to output attack target prediction results of each target blue unmanned aerial vehicle;
the intention interpretation unit is used for inputting track prediction results of all the target blue unmanned aerial vehicles and attack target prediction results of all the target blue unmanned aerial vehicles into a pre-trained intention interpretation model so as to output a blue unmanned aerial vehicle cluster state;
the game countermeasure unit is used for inputting the cluster state of the blue unmanned aerial vehicle, the relative position information between the red unmanned aerial vehicle and the target blue unmanned aerial vehicle at the current moment, the relative position information between the red unmanned aerial vehicle and other red unmanned aerial vehicles at the current moment and the motion state of the red unmanned aerial vehicle at the current moment into a pre-trained cluster average field random game model so as to output the preferred motion of the red unmanned aerial vehicle and control the red unmanned aerial vehicle to move according to the determined preferred motion;
the target blue unmanned aerial vehicle is a blue unmanned aerial vehicle in the red unmanned aerial vehicle monitoring range, and other red unmanned aerial vehicles are red unmanned aerial vehicles in the red unmanned aerial vehicle monitoring range.
9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory in communication over the bus when the electronic device is running, said processor executing said machine readable instructions to perform the steps of the unmanned aerial vehicle game countermeasure policy generation method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the unmanned aerial vehicle game countermeasure policy generation method of any of claims 1 to 7.
CN202310628021.7A 2023-05-31 2023-05-31 Unmanned plane game countermeasure strategy generation method and system and electronic equipment Active CN116360503B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310628021.7A CN116360503B (en) 2023-05-31 2023-05-31 Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310628021.7A CN116360503B (en) 2023-05-31 2023-05-31 Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Publications (2)

Publication Number Publication Date
CN116360503A true CN116360503A (en) 2023-06-30
CN116360503B CN116360503B (en) 2023-10-13

Family

ID=86922516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310628021.7A Active CN116360503B (en) 2023-05-31 2023-05-31 Unmanned plane game countermeasure strategy generation method and system and electronic equipment

Country Status (1)

Country Link
CN (1) CN116360503B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842127A (en) * 2023-08-31 2023-10-03 中国人民解放军海军航空大学 Self-adaptive auxiliary decision-making intelligent method and system based on multi-source dynamic data
CN117150738A (en) * 2023-08-10 2023-12-01 中国船舶集团有限公司第七〇九研究所 Action direction pre-judging method under complex scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190385473A1 (en) * 2017-03-03 2019-12-19 Mbda France Method and device for predicting optimum attack and defence solutions in a military conflict scenario
CN112269396A (en) * 2020-10-14 2021-01-26 北京航空航天大学 Unmanned aerial vehicle cluster cooperative confrontation control method for eagle pigeon-imitated intelligent game
CN112947541A (en) * 2021-01-15 2021-06-11 南京航空航天大学 Unmanned aerial vehicle intention track prediction method based on deep reinforcement learning
CN114063644A (en) * 2021-11-09 2022-02-18 北京航空航天大学 Unmanned combat aircraft air combat autonomous decision method based on pigeon flock reverse confrontation learning
CN114492749A (en) * 2022-01-24 2022-05-13 中国电子科技集团公司第五十四研究所 Time-limited red-blue countermeasure problem-oriented game decision method with action space decoupling function
US20220315219A1 (en) * 2021-04-03 2022-10-06 Northwestern Polytechnical University Air combat maneuvering method based on parallel self-play

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190385473A1 (en) * 2017-03-03 2019-12-19 Mbda France Method and device for predicting optimum attack and defence solutions in a military conflict scenario
CN112269396A (en) * 2020-10-14 2021-01-26 北京航空航天大学 Unmanned aerial vehicle cluster cooperative confrontation control method for eagle pigeon-imitated intelligent game
CN112947541A (en) * 2021-01-15 2021-06-11 南京航空航天大学 Unmanned aerial vehicle intention track prediction method based on deep reinforcement learning
US20220315219A1 (en) * 2021-04-03 2022-10-06 Northwestern Polytechnical University Air combat maneuvering method based on parallel self-play
CN114063644A (en) * 2021-11-09 2022-02-18 北京航空航天大学 Unmanned combat aircraft air combat autonomous decision method based on pigeon flock reverse confrontation learning
CN114492749A (en) * 2022-01-24 2022-05-13 中国电子科技集团公司第五十四研究所 Time-limited red-blue countermeasure problem-oriented game decision method with action space decoupling function

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邵将;徐扬;罗德林;: "无人机多机协同对抗决策研究", 信息与控制, vol. 47, no. 3, pages 347 - 354 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150738A (en) * 2023-08-10 2023-12-01 中国船舶集团有限公司第七〇九研究所 Action direction pre-judging method under complex scene
CN117150738B (en) * 2023-08-10 2024-05-10 中国船舶集团有限公司第七〇九研究所 Action direction pre-judging method under complex scene
CN116842127A (en) * 2023-08-31 2023-10-03 中国人民解放军海军航空大学 Self-adaptive auxiliary decision-making intelligent method and system based on multi-source dynamic data
CN116842127B (en) * 2023-08-31 2023-12-05 中国人民解放军海军航空大学 Self-adaptive auxiliary decision-making intelligent method and system based on multi-source dynamic data

Also Published As

Publication number Publication date
CN116360503B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN116360503B (en) Unmanned plane game countermeasure strategy generation method and system and electronic equipment
Foerster et al. Stabilising experience replay for deep multi-agent reinforcement learning
CN105892480B (en) Isomery multiple no-manned plane systematic collaboration, which is examined, beats task self-organizing method
Hu et al. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat
Wei et al. Recurrent MADDPG for object detection and assignment in combat tasks
Feng et al. Towards human-like social multi-agents with memetic automaton
CN114115285A (en) Multi-agent search emotion target path planning method and device
CN116661503B (en) Cluster track automatic planning method based on multi-agent safety reinforcement learning
Cao et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory
CN112651486A (en) Method for improving convergence rate of MADDPG algorithm and application thereof
CN113139331A (en) Air-to-air missile situation perception and decision method based on Bayesian network
CN114444201B (en) Ground attack unmanned aerial vehicle autonomous capability assessment method based on Bayesian network
Feng et al. Multifunctional radar cognitive jamming decision based on dueling double deep Q-network
Kong et al. Hierarchical reinforcement learning from competitive self-play for dual-aircraft formation air combat
Yang et al. WISDOM-II: A network centric model for warfare
Liu et al. Multi-uavs cooperative coverage reconnaissance with neural network and genetic algorithm
Hou et al. Advances in memetic automaton: Toward human-like autonomous agents in complex multi-agent learning problems
Zhang et al. Situational continuity-based air combat autonomous maneuvering decision-making
CN116933948A (en) Prediction method and system based on improved seagull algorithm and back propagation neural network
CN116757249A (en) Unmanned aerial vehicle cluster strategy intention recognition method based on distributed reinforcement learning
Liu et al. Evolutionary algorithm-based attack strategy with swarm robots in denied environments
CN114757092A (en) System and method for training multi-agent cooperative communication strategy based on teammate perception
Satir et al. Nonlinear model based guidance with deep learning based target trajectory prediction against aerial agile attack patterns
CN109523838B (en) Heterogeneous cooperative flight conflict solution method based on evolutionary game
Guo et al. Uav air combat algorithm based on bayesian probability model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant