CN114815891A - PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method - Google Patents

PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method Download PDF

Info

Publication number
CN114815891A
CN114815891A CN202210525303.XA CN202210525303A CN114815891A CN 114815891 A CN114815891 A CN 114815891A CN 202210525303 A CN202210525303 A CN 202210525303A CN 114815891 A CN114815891 A CN 114815891A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
enclosure
target
idqn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210525303.XA
Other languages
Chinese (zh)
Inventor
李波
黄晶益
谢国燕
杨志鹏
杨帆
万开方
高晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210525303.XA priority Critical patent/CN114815891A/en
Publication of CN114815891A publication Critical patent/CN114815891A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/60Intended control result
    • G05D1/656Interaction with payloads or external entities
    • G05D1/683Intercepting moving targets
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/20Control system inputs
    • G05D1/24Arrangements for determining position or orientation
    • G05D1/246Arrangements for determining position or orientation using environment maps, e.g. simultaneous localisation and mapping [SLAM]
    • G05D1/2464Arrangements for determining position or orientation using environment maps, e.g. simultaneous localisation and mapping [SLAM] using an occupancy grid
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/40Control within particular dimensions
    • G05D1/43Control of position or course in two dimensions
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/60Intended control result
    • G05D1/69Coordinated control of the position or course of two or more vehicles
    • G05D1/698Control allocation
    • G05D1/6983Control allocation by distributed or sequential control
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D2101/00Details of software or hardware architectures used for the control of position
    • G05D2101/10Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques
    • G05D2101/15Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques using machine learning, e.g. neural networks
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D2105/00Specific applications of the controlled vehicles
    • G05D2105/35Specific applications of the controlled vehicles for combat
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D2109/00Types of controlled vehicles
    • G05D2109/20Aircraft, e.g. drones

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention provides a PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method, which comprises the steps of modeling a grid digital map and an unmanned aerial vehicle motion model, deploying a multi-unmanned aerial vehicle neural network model by adopting a depth Q network algorithm through interaction of each unmanned aerial vehicle and the environment, optimizing the algorithm model by utilizing a priority experience playback strategy, then establishing a state space, an action space and a reward function to carry out targeted design on the multi-unmanned aerial vehicle enclosure capture tactical model, and finally establishing the multi-unmanned aerial vehicle enclosure capture tactical model which can be used for establishing effective enclosure capture tactics in a complex obstacle environment to realize enclosure capture of maneuvering targets. The method can realize the enclosure of the maneuvering target, effectively improves the sampling efficiency of experience samples, solves the problem of low training speed of the unmanned aerial vehicle decision model in a complex task scene, and is suitable for the enclosure and autonomous obstacle avoidance tasks of the unmanned aerial vehicles in a complex dynamic environment, and the finally constructed multi-unmanned aerial vehicle enclosure tactical model has higher stability.

Description

PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method
Technical Field
The invention relates to the field of multi-agent systems and unmanned aerial vehicle intelligent decision making, in particular to a multi-unmanned aerial vehicle enclosure tactical method.
Background
The unmanned aerial vehicle has the characteristics of strong concealment, high safety and the like, and provides a new mode idea for meeting the requirements of multi-machine cooperation, low casualty rate and the like required by modern informatization defense tactics. In the scene that enemy invades our air to take the air and carries out illegal information reconnaissance, adopt many defense unmanned aerial vehicles to constitute many unmanned aerial vehicle formations, let many unmanned aerial vehicle formations can be according to the situation environment and carry out the enclosure expulsion or the accompanying of target automatically and keep watch on, have important meaning.
The existing research on multi-unmanned aerial vehicle enclosure capture tactics is small, the position of a target is solved in real time mainly by adopting an artificial intelligence method, and then a corresponding tracking path is planned to realize the approach and capture of the target. The patent publication CN112241173A proposes an intelligent planning method for multi-agent aggregation points based on artificial potential field, which converts the target into virtual aggregation points, and then calculates the repulsive force between agents and the repulsive force between agent and obstacle by using an artificial potential field model, and calculates the position and path information of the virtual aggregation points of the agents. However, the method does not consider the problem of large calculation amount brought by model calculation in a dynamic environment, and cannot guarantee the real-time performance of multi-agent decision making. In recent years, the development of deep reinforcement learning technology provides a new idea for real-time online intelligent decision making of an unmanned system. The patent publication CN113625775A provides a multi-unmanned-plane enclosure capturing method combining state prediction and DDPG, which predicts the states of unmanned planes based on a least square method, trains an unmanned plane model by adopting a deep reinforcement learning DDPG algorithm, and finally deploys the unmanned plane model into a multi-unmanned-plane system to realize multi-unmanned-plane enclosure capturing decision. However, when the method is used for training the unmanned aerial vehicle decision model, the training sample data size is large, the types of all variables are complex, the training efficiency is low, and the finally obtained multi-unmanned aerial vehicle trapping model is poor in stability and has certain limitations.
The priority experience playback strategy is a deep reinforcement learning optimization method, the use rate of experience samples with high priority is improved by calculating the importance of each experience sample and sequencing the priority, and the training speed of an intelligent agent is finally improved. Therefore, how to introduce the prior experience playback strategy into the multi-agent deep reinforcement learning method, and combine the prior experience playback strategy with a complex multi-unmanned-aerial-vehicle enclosure tactical model to improve the autonomous behavior of each unmanned aerial vehicle, and finally realize enclosure capture of the target through cooperative decision, which becomes a difficult problem of deep reinforcement learning applied in the multi-unmanned-aerial-vehicle intelligent decision field.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-unmanned aerial vehicle capture tactical method based on PER-IDQN. The invention relates to a multi-unmanned aerial vehicle encirclement tactical method based on a priority empirical playback strategy Independent Deep Q learning Network (PER-IDQN). Specifically, a grid digital map, an unmanned aerial vehicle motion model and the like are modeled, a Deep Q Network (DQN) algorithm is adopted to deploy a multi-unmanned aerial vehicle neural Network model through interaction between each unmanned aerial vehicle and the environment, a priority empirical Replay strategy (PER) is utilized to optimize the algorithm model, then a state space, an action space and a reward function are constructed to carry out targeted design on the multi-unmanned aerial vehicle enclosure tactical model, and the finally constructed multi-unmanned aerial vehicle enclosure tactical model can be used for making an effective enclosure tactical plan under a complex obstacle environment, so that enclosure capture of a maneuvering target is realized.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a grid digital map model and an unmanned aerial vehicle model;
step 2: constructing a multi-unmanned aerial vehicle enclosure decision model based on a PER-IDQN algorithm;
and step 3: constructing a multi-unmanned-plane enclosure capture decision model based on a PER-IDQN algorithm and training; and each unmanned aerial vehicle inputs the state information into the neural network respectively, the PER-IDQN neural network obtained through training is fitted, the flight action of the unmanned aerial vehicle is output, and each unmanned aerial vehicle for enclosure realizes the enclosure of the target through cooperative decision.
The steps of constructing the grid digital map model and the unmanned aerial vehicle model are as follows:
step 1-1: in order to conveniently quantify the specific position of the unmanned aerial vehicle, the whole airspace range is uniformly divided into a plurality of grids, each grid is set to be a square with the length of l kilometers, the task scene is a kilometer, b kilometer, and the total width l of the task scene is width A.l kilometers, total length l length B x l km;
step 1-2: setting the speed of the capture unmanned aerial vehicle to be l kilometer/time step, and setting the speed of the target unmanned aerial vehicle to be n x l kilometer/time step;
step 1-3: setting the size of the action space of the unmanned aerial vehicle to be 4, namely the unmanned aerial vehicle can only move in four directions, namely up, down, left and right directions in each step;
step 1-4: the detectable range of each unmanned aerial vehicle is set to be a circular area with l kilometers as the radius, namely, the detectable range is approximate to a peripheral Sudoku area with the unmanned aerial vehicle as the center in a grid scene.
The step 2 of constructing a multi-unmanned aerial vehicle trapping decision model based on a PER-IDQN algorithm comprises the following steps:
step 2-1: the action space A of the capture unmanned aerial vehicle is set as follows:
A=[(0,-l),(0,l),(-l,0),(0,l)]
(0, -l), (0, l), (-l,0), (0, l) represent 4 actions of the drone moving down, up, left, and right, l represents the length of each grid;
step 2-2: setting the state space S of the capture unmanned aerial vehicle as follows:
S=[S uav ,S teamer ,S obser ,S target ,S finish ]
wherein S uav ,S teamer ,S obser ,S target ,S finish Respectively representing state information of the unmanned aerial vehicle, information of other friend unmanned aerial vehicles, unmanned aerial vehicle detection information, target information and task state information;
specifically, for the ith unmanned aerial vehicle in the multi-enclosure unmanned aerial vehicle system, the state information of the ith unmanned aerial vehicle is set as follows:
Figure BDA0003644212810000031
x i and y i Coordinate information representing an ith drone;
setting acquirable friend-side unmanned aerial vehicle state information for ith unmanned aerial vehicle
Figure BDA0003644212810000032
Comprises the following steps:
Figure BDA0003644212810000033
wherein x is i And y i Respectively representing the horizontal and vertical coordinate values of the ith unmanned aerial vehicle, and n represents the number of the unmanned aerial vehicles;
set the observation information of the unmanned aerial vehicle i as
Figure BDA0003644212810000034
Wherein:
Figure BDA0003644212810000035
o m the detection readings represent the exploration information of the surrounding unmanned aerial vehicle on the peripheral Sudoku positions;
in addition, in combination with the relative distance and orientation information of the target with respect to the i-th host unmanned aerial vehicle, for the i-th host unmanned aerial vehicle, acquirable target information is set
Figure BDA0003644212810000036
Comprises the following steps:
Figure BDA0003644212810000037
wherein, d i And theta i Respectively representing the distance and the relative azimuth angle x between the unmanned aerial vehicle for capturing and the target by one party e And y e The abscissa and ordinate values of the escape target are represented;
in addition, in order to help the encirclement unmanned aerial vehicle to effectively complete the encirclement on the target, for the ith encirclement unmanned aerial vehicle, a sub-state quantity is set
Figure BDA0003644212810000038
Figure BDA0003644212810000039
Indicating whether the target is completely captured or not;
step 2-3: considering three decision processes of maneuvering approach, cooperative enclosure and autonomous obstacle avoidance to a target in the multi-unmanned aerial vehicle enclosure tactics, for each individual enclosure unmanned aerial vehicle, setting a reward function R as follows:
R=σ 1 r pos2 r safe3 r effi4 r task
wherein r is pos ,r safe ,r effi ,r task Respectively representing position reward, safe flight reward, high-efficiency flight reward and task completion reward sigma 1 ~σ 4 The corresponding weight value for each reward;
specifically, the position sub-reward is set as follows:
r pos =(|x e -x i |+|y e -y i |)-(|x e -x i |+|y e -y i |)′
wherein, (| x) e -x i |+|y e -y i I) and (| x) e -x i |+|y e -y i |)' respectively represents the distance between the unmanned aerial vehicle and the target at the current moment and the next moment;
the safe flyer reward of the capture unmanned aerial vehicle is set as follows:
Figure BDA0003644212810000041
the high-efficiency flyer reward of the capture unmanned aerial vehicle is set as follows:
r effi =-n stay
n stay representing the number of times of the unmanned aerial vehicle for enclosure to stay at the current grid position;
setting a task completion sub-reward item of the capture unmanned aerial vehicle as follows:
Figure BDA0003644212810000042
step 2-4: the setting of the multiple unmanned aerial vehicle enclosure judgment conditions is as follows: and when the target is away from each enclosure unmanned aerial vehicle by a unit grid distance, the target is regarded as being incapable of escaping, and the enclosure task is completed.
The step 3: constructing a multi-unmanned-plane enclosure capture decision model based on a PER-IDQN algorithm and training;
step 3-1: respectively constructing a main BP neural network hidden layer theta in a PER-IDQN algorithm for each captive unmanned aerial vehicle i And state-behavior value function
Figure BDA0003644212810000043
Wherein the value function input parameter
Figure BDA0003644212810000044
Respectively for the state and behavior of the unmanned aerial vehicle i at the moment t, and respectively calculating the main network parameter theta i Copy to target network θ i′ Of middle, i.e. theta i →θ i′ Wherein i represents the drone serial number;
step 3-2: setting the size of an experience playback queue as M, a discount factor as gamma, the maximum number of rounds E, the maximum number of steps T of each round and the size of an experience extraction N batch Setting the number e of rounds as 0;
step 3-3: initializing n states s of an unmanned aerial vehicle for enclosure 1 ,…,s n Updating the current time t to be 0;
step 3-4: generating a random number z, for each drone i, performing the action:
Figure BDA0003644212810000051
wherein epsilon greedy Is a greedy coefficient of the color space,
Figure BDA0003644212810000052
outputting the corresponding action for the maximum Q value of the main network;
step 3-5: set of actions a of execution 1 ,…,a n Calculating a prize value r 1 ,…,r n Update status to s' 1 ,…,s′ n And respectively calculate the priority p 1 ,…,p n Stored in the experience playback queue together;
step 3-6: based on
Figure BDA0003644212810000053
Collecting N batch A sample base, where j represents the serial number of the extracted empirical sample, p j The priority is expressed, and the parameter alpha is used for adjusting the priority sampling degree of the samples;
calculating a weight coefficient w for an importance sample j
w j =(M·P(i)) /max i w i
Beta is a hyper-parameter and is used for adjusting the influence calculation of importance sampling on the PER algorithm and the model convergence rate;
calculating the time difference error of the current moment:
Figure BDA0003644212810000054
wherein,
Figure BDA0003644212810000055
representing the reward obtained by drone i at time t + 1;
eyes of calculationScaling to obtain a target value
Figure BDA0003644212810000056
Figure BDA0003644212810000057
Wherein gamma is the reward discount factor, j is the sample number, theta i′ A target network representing an ith agent;
combining importance weights w j Updating the parameter L (theta) of the current network based on the minimization loss function i ):
Figure BDA0003644212810000058
Step 3-7: respectively updating the target network parameters of each unmanned aerial vehicle agent:
θ i′ ←τθ i +(1-τ)θ i′
τ represents an update scale factor;
step 3-8: and (4) adding 1 to the update step length t, and executing judgment: when T is less than T and the multi-unmanned aerial vehicle enclosure judgment condition is not met, entering the step 3-4; otherwise, entering the step 3-9;
step 3-9: the number of update rounds e is added to 1, and judgment is performed: if E < E, updating to step 3-3; otherwise, finishing the training and entering the step 3-10;
step 3-10: terminating the PER-IDQN network training process and storing the current network parameters; and loading the stored parameters into a multi-unmanned-aerial-vehicle trapping system, inputting state information into a neural network by each unmanned aerial vehicle at each moment, fitting the neural network by using the trained PER-IDQN, outputting the flight action of the unmanned aerial vehicles, and realizing the trapping of the target by each trapping unmanned aerial vehicle through cooperative decision.
The beneficial effect of the invention lies in that the proposed multi-unmanned aerial vehicle surrounding and capturing tactical method based on PER-IDQN has the advantages that:
(1) the constructed multi-unmanned aerial vehicle enclosure-arrest decision system does not need to be independently specified for the tactics of each unmanned aerial vehicle, the tactics and task cooperation can be completed through environment sensing and information sharing among the unmanned aerial vehicles, and the finally formulated multi-unmanned aerial vehicle enclosure-arrest tactics can achieve enclosure-arrest of maneuvering targets.
(2) According to the method, a prior experience replay PER strategy is introduced into the IDQN algorithm, so that the sampling efficiency of experience samples is effectively improved, and the problem of low training rate of the decision model of the unmanned aerial vehicle in a complex task scene is solved. The finally constructed multi-unmanned aerial vehicle enclosure tactical model is stronger in stability and can be suitable for multi-unmanned aerial vehicle enclosure and autonomous obstacle avoidance tasks in complex dynamic environments.
Drawings
Fig. 1 is a schematic view of unmanned aerial vehicle detection.
Fig. 2 is a schematic diagram of a position relationship between the unmanned aerial vehicle for enclosure and the target.
Fig. 3 is a schematic diagram of training of a multi-unmanned aerial vehicle capture model based on PER-IDQN.
Fig. 4 is a schematic diagram of a multi-drone enclosure capture.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention provides a PER-IDQN-based multi-unmanned aerial vehicle containment tactical method, and the whole flow is shown in figure 3. The technical solution is further clearly and completely described below with reference to the accompanying drawings and specific embodiments:
step 1: constructing a grid digital map model and an unmanned aerial vehicle model;
step 1-1: in order to conveniently quantify the specific position of the unmanned aerial vehicle, the whole airspace range is divided into a plurality of grids, the length of each grid is set to be 0.1 kilometer, and the total width and the total length of a task scene are respectively set to be l width 8 km and l length 4 km ═ 4 km;
step 1-2: setting the speed of the capture unmanned aerial vehicle to be 0.1 kilometer per time step, and setting the speed of the target unmanned aerial vehicle to be 0.2 kilometer per time step;
step 1-3: setting the size of the action space of the unmanned aerial vehicle to be 4, namely the unmanned aerial vehicle can only move in four directions, namely up, down, left and right directions in each step;
step 1-4: setting the detectable range of each unmanned aerial vehicle as a circular region with the radius of 0.1 kilometer, namely, approximately as a peripheral Sudoku region with the unmanned aerial vehicle as the center in a grid scene;
step 2: constructing a multi-unmanned aerial vehicle enclosure decision model based on a PER-IDQN algorithm;
step 2-1: setting an action space A of the capture unmanned aerial vehicle as follows:
A=[(0,-l),(0,l),(-l,0),(0,l)]
(0, -l), (0, l), (-l,0), (0, l) represents 4 actions of the drone moving down, up, left, and right, l represents the length of each grid;
step 2-2: setting the state space S of the capture unmanned aerial vehicle as follows:
S=[S uav ,S teamer ,S obser ,S target ,S finish ]
wherein S uav ,S teamer ,S obser ,S target ,S finish Respectively representing the state information of the unmanned aerial vehicle, the information of other friend-side unmanned aerial vehicles, the detection information of the unmanned aerial vehicle, target information and task state information;
specifically, for the ith unmanned aerial vehicle in the multi-enclosure unmanned aerial vehicle system, the state information of the ith unmanned aerial vehicle is set as follows:
Figure BDA0003644212810000071
x i and y i Coordinate information representing an ith drone;
setting acquirable friend unmanned aerial vehicle state information for ith unmanned aerial vehicle
Figure BDA0003644212810000072
Comprises the following steps:
Figure BDA0003644212810000073
wherein x is i And y i Respectively representing the horizontal and vertical coordinate values of the ith unmanned aerial vehicle, and n represents the number of the unmanned aerial vehicles;
set the observation information of the unmanned aerial vehicle i as
Figure BDA0003644212810000074
Wherein:
Figure BDA0003644212810000075
o m the detection readings represent the exploration information of the surrounding unmanned aerial vehicle on the peripheral Sudoku positions; the unmanned aerial vehicle detection information is shown in fig. 1;
in addition, in combination with the relative distance and orientation information of the target with respect to the i-th host drone, for the i-th host drone, the target information that can be acquired by the i-th host drone is set
Figure BDA0003644212810000081
Comprises the following steps:
Figure BDA0003644212810000082
wherein d is i And theta i Respectively represents the distance and the relative azimuth angle x between the unmanned plane and the target e And y e Representing the horizontal and vertical coordinate values of the escape target; the position relation between the capture unmanned aerial vehicle and the target is shown in fig. 2;
in addition, in order to help the encirclement unmanned aerial vehicle to effectively complete the encirclement on the target, for the ith encirclement unmanned aerial vehicle, a sub-state quantity is set
Figure BDA0003644212810000083
Figure BDA0003644212810000084
Indicating whether the target is completely captured or not;
step 2-3: considering decision processes such as maneuvering approach, cooperative capture, autonomous obstacle avoidance and the like of a target in multi-unmanned aerial vehicle capture tactics, for each individual capture unmanned aerial vehicle, a reward function R is set as follows:
R=σ 1 r pos2 r safe3 r effi4 r task
wherein r is pos ,r safe ,r effi ,r task Respectively representing position reward, safe flight reward, high-efficiency flight reward, task completion reward, sigma 1~4 Awarding corresponding weight values for all items;
specifically, the position sub-reward is set as follows:
r pos =(|x e -x i |+|y e -y i |)-(|x e -x i |+|y e -y i |)′
wherein, (| x) e -x i |+|y e -y i I) and (| x) e -x i |+|y e -y i |)' respectively represents the distance between the unmanned aerial vehicle and the target at the current moment and the next moment;
the safe flyer reward of the capture unmanned aerial vehicle is set as follows:
Figure BDA0003644212810000085
the high-efficiency flyer reward of the capture unmanned aerial vehicle is set as follows:
r effi =-n stay
n stay representing the number of times of the unmanned aerial vehicle for enclosure to stay at the current grid position;
setting a task completion sub-reward item of the capture unmanned aerial vehicle as follows:
Figure BDA0003644212810000091
step 2-4: setting a plurality of unmanned aerial vehicle enclosure judgment conditions: when the target is away from each enclosure unmanned aerial vehicle by a unit grid distance, the target is regarded as being unable to escape, and the enclosure task is completed;
and step 3: constructing a multi-unmanned aerial vehicle capture decision model and training the model based on a deep reinforcement learning PER-IDQN algorithm;
step 3-1: respectively constructing a main BP neural network hidden layer theta in a PER-IDQN algorithm for each captive unmanned aerial vehicle i And state-behavior value function
Figure BDA0003644212810000092
Wherein the value function input parameter
Figure BDA0003644212810000093
Respectively for the state and behavior of the unmanned aerial vehicle i at the moment t, and respectively calculating the main network parameter theta i Copy to target network θ i′ Of middle, i.e. theta i →θ i′ Wherein i represents the drone serial number;
step 3-2: setting the size M of an experience playback queue, a discount factor gamma, the maximum number of rounds E, the maximum number of steps T of each round and the size N of an experience extraction batch Setting the number e of rounds as 0;
step 3-3: initializing n states s of an unmanned aerial vehicle for enclosure 1 ,…,s n Updating the current time t to be 0;
step 3-4: generating a random number z, for each drone i, performing the action:
Figure BDA0003644212810000094
wherein epsilon greedy Is a greedy coefficient to be used for the image display,
Figure BDA0003644212810000095
outputting the corresponding action for the maximum Q value of the main network;
step 3-5: set of actions a of execution 1 ,…,a n Calculating a prize value r 1 ,…,r n Update status to s' 1 ,…,s′ n And respectively calculate the priority p 1 ,…,p n Stored in the experience playback queue together;
step 3-6: based on
Figure BDA0003644212810000096
Collecting N batch A sample base where j denotes the serial number of the extracted empirical sample, p j The priority is expressed, and the parameter alpha is used for adjusting the priority sampling degree of the samples;
calculating a weight coefficient w for an importance sample j
w j =(M·P(i)) /max i w i
M is the size of the empirical playback queue, beta is a hyperparameter and is used for adjusting the influence calculation of importance sampling on the PER algorithm and the model convergence rate;
calculating the time difference error of the current moment:
Figure BDA0003644212810000101
wherein,
Figure BDA0003644212810000102
representing the reward obtained by drone i at time t + 1;
calculating the target value to obtain the target value Y t i
Figure BDA0003644212810000103
Wherein, gamma is reward discount factor, j is sample number, theta i′ A target network representing an ith agent;
combining importance weights w j Updating the parameter L (theta) of the current network based on the minimization loss function i ):
Figure BDA0003644212810000104
Step 3-7: respectively updating the target network parameters of each unmanned aerial vehicle agent:
θ i′ ←τθ i +(1-τ)θ i′
τ represents an update scale factor;
step 3-8: and (3) updating the step length t to t +1, and performing judgment: when T is less than T and does not meet the multi-unmanned aerial vehicle enclosure judgment condition shown in the step 2-4, entering the step 3-4; otherwise, entering the step 3-9;
step 3-9: update round number e ═ e +1, decision is performed: if E < E, updating to step 3-3; otherwise, finishing the training and entering the step 3-10;
step 3-10: terminating the PER-IDQN network training process and storing the current network parameters; and loading the stored parameters into a multi-unmanned aerial vehicle trapping system. At each moment, each unmanned aerial vehicle inputs state information into the neural network respectively, the PER-IDQN neural network obtained through training is fitted, the flight action of the unmanned aerial vehicles is output, and the target is captured by each capturing unmanned aerial vehicle through cooperative decision.
In order to better illustrate the superiority of the method of the present invention, the present embodiment was tested in different scenes. Specifically, in a task scenario in which the number of grids is 80 × 40, the obstacle mobility is kept at 10%, different obstacle coverage rates are set and the test is performed, and the test results are shown in table 1.
Table 1 multi-unmanned plane enclosure performance under different environmental barrier coverage
Figure BDA0003644212810000105
Figure BDA0003644212810000111
As can be seen from the above table, as the coverage rate of the environmental obstacles increases, the multi-unmanned aerial vehicle enclosure time increases; when the coverage rate of the obstacles is increased to 0.10 or above, compared with the traditional IDQN algorithm, the average simulation step length of the multi-unmanned aerial vehicle enclosure capture tactics based on the PER-IDQN algorithm is less, which means that the enclosure capture tactics formulated when the multi-unmanned aerial vehicle of the PER-IDQN algorithm faces to a complex obstacle environment are more effective, and the enclosure capture of the targets can be realized in a shorter time. A simulation diagram of the multi-unmanned plane enclosure capture is shown in FIG. 4.
In summary, the multi-unmanned aerial vehicle enclosure capture tactical method based on PER-IDQN provided by the invention adopts off-line learning to train the neural network, stores data generated during training in the experience pool, provides learning samples for optimization of the neural network, and combines the maneuvering control and cooperative enclosure capture task requirements of the multi-unmanned aerial vehicle to design the action and state of the unmanned aerial vehicle, thereby realizing the intelligent decision control of the multi-unmanned aerial vehicle.
The multi-unmanned aerial vehicle enclosure tactical method provided by the invention has high model training efficiency, and the constructed multi-unmanned aerial vehicle enclosure tactical model can be suitable for being used in a complex dynamic scene, so that the multi-unmanned aerial vehicle enclosure tactical execution efficiency is improved.
The above description is only a preferred embodiment of the present invention, and it should be noted that: the embodiments of the present invention are not limited to the above-described implementation methods; it will be apparent to those skilled in the art that other variations and modifications can be made without departing from the spirit of the invention. It should be understood that any equivalent substitutions, modifications and improvements made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (4)

1. A multi-unmanned aerial vehicle enclosure tactical method based on PER-IDQN is characterized by comprising the following steps:
step 1: constructing a grid digital map model and an unmanned aerial vehicle model;
step 2: constructing a multi-unmanned aerial vehicle enclosure decision model based on a PER-IDQN algorithm;
and step 3: constructing a multi-unmanned-plane enclosure capture decision model based on a PER-IDQN algorithm and training; and each unmanned aerial vehicle inputs the state information into the neural network respectively, the PER-IDQN neural network obtained through training is fitted, the flight action of the unmanned aerial vehicle is output, and each unmanned aerial vehicle for enclosure realizes the enclosure of the target through cooperative decision.
2. The PER-IDQN-based multi-drone containment tactical method of claim 1, wherein:
the steps of constructing the grid digital map model and the unmanned aerial vehicle model are as follows:
step 1-1: in order to conveniently quantify the specific position of the unmanned aerial vehicle, the whole airspace range is uniformly divided into a plurality of grids, each grid is set to be a square with the length of l kilometers, the task scene is a kilometer, b kilometer, and the total width l of the task scene is width A.l kilometers, total length l length B x l km;
step 1-2: setting the speed of the capture unmanned aerial vehicle to be l kilometer/time step, and setting the speed of the target unmanned aerial vehicle to be n x l kilometer/time step;
step 1-3: setting the size of the action space of the unmanned aerial vehicle to be 4, namely the unmanned aerial vehicle can only move in four directions, namely up, down, left and right directions in each step;
step 1-4: the detectable range of each unmanned aerial vehicle is set to be a circular area with l kilometers as the radius, namely, the detectable range is approximate to a peripheral Sudoku area with the unmanned aerial vehicle as the center in a grid scene.
3. The PER-IDQN-based multi-drone containment tactical method of claim 1, wherein:
the step 2 of constructing a multi-unmanned aerial vehicle trapping decision model based on a PER-IDQN algorithm comprises the following steps:
step 2-1: setting an action space A of the capture unmanned aerial vehicle as follows:
A=[(0,-l),(0,l),(-l,0),(0,l)]
(0, -l), (0, l), (-l,0), (0, l) represents 4 actions of the drone moving down, up, left, and right, l represents the length of each grid;
step 2-2: setting the state space S of the capture unmanned aerial vehicle as follows:
S=[S uav ,S teamer ,S obser ,S target ,S finish ]
wherein S uav ,S teamer ,S obser ,S target ,S finish Respectively representing state information of the unmanned aerial vehicle, information of other friend unmanned aerial vehicles, unmanned aerial vehicle detection information, target information and task state information;
specifically, for the ith unmanned aerial vehicle in the multi-enclosure unmanned aerial vehicle system, the state information of the ith unmanned aerial vehicle is set as follows:
Figure FDA0003644212800000021
x i and y i Coordinate information representing an ith drone;
setting acquirable friend-side unmanned aerial vehicle state information for ith unmanned aerial vehicle
Figure FDA0003644212800000028
Comprises the following steps:
Figure FDA0003644212800000022
wherein x is i And y i Respectively representing the horizontal and vertical coordinate values of the ith unmanned aerial vehicle, and n represents the number of the unmanned aerial vehicles;
set the observation information of the unmanned aerial vehicle i as
Figure FDA0003644212800000029
Wherein:
Figure FDA0003644212800000023
o m the detection readings represent the exploration information of the surrounding unmanned aerial vehicle on the peripheral Sudoku positions;
in addition, are combinedRelative distance and direction information of the target relative to the unmanned aerial vehicle i of the same party, and acquirable target information is set for the ith enclosing unmanned aerial vehicle
Figure FDA0003644212800000024
Comprises the following steps:
Figure FDA0003644212800000025
wherein d is i And theta i Respectively representing the distance and the relative azimuth angle x between the unmanned aerial vehicle for capturing and the target by one party e And y e The abscissa and ordinate values of the escape target are represented;
in addition, in order to help the encirclement unmanned aerial vehicle to effectively complete the encirclement on the target, for the ith encirclement unmanned aerial vehicle, a sub-state quantity is set
Figure FDA0003644212800000026
Figure FDA0003644212800000027
Indicating whether the target is completely captured or not;
step 2-3: considering three decision processes of maneuvering approach, cooperative enclosure and autonomous obstacle avoidance to a target in the multi-unmanned aerial vehicle enclosure tactics, for each individual enclosure unmanned aerial vehicle, setting a reward function R as follows:
R=σ 1 r pos2 r safe3 r effi4 r task
wherein r is pos ,r safe ,r effi ,r task Respectively representing position reward, safe flight reward, high-efficiency flight reward and task completion reward sigma 1 ~σ 4 The corresponding weight value for each reward;
specifically, the position sub-reward is set as follows:
r pos =(|x e -x i |+|y e -y i |)-(|x e -x i |+|y e -y i |)′
wherein, (| x) e -x i |+|y e -y i I) and (| x) e -x i |+|y e -y i |)' respectively represents the distance between the unmanned aerial vehicle and the target at the current moment and the next moment;
the safe flyer reward of the capture unmanned aerial vehicle is set as follows:
Figure FDA0003644212800000031
the high-efficiency flyer reward of the capture unmanned aerial vehicle is set as follows:
r effi =-n stay
n stay representing the number of times of the unmanned aerial vehicle for enclosure to stay at the current grid position;
setting a task completion sub-reward item of the capture unmanned aerial vehicle as follows:
Figure FDA0003644212800000032
step 2-4: the setting of the multiple unmanned aerial vehicle enclosure judgment conditions is as follows: and when the target is away from each enclosure unmanned aerial vehicle by a unit grid distance, the target is regarded as being incapable of escaping, and the enclosure task is completed.
4. The PER-IDQN-based multi-drone containment tactical method of claim 1, wherein:
the step 3: constructing a multi-unmanned-plane enclosure capture decision model based on a PER-IDQN algorithm and training;
step 3-1: respectively constructing a main BP neural network hidden layer theta in a PER-IDQN algorithm for each captive unmanned aerial vehicle i And state-behavior value function
Figure FDA0003644212800000033
Wherein the value function input parameter
Figure FDA0003644212800000034
Respectively for the state and behavior of the unmanned aerial vehicle i at the moment t, and respectively calculating the main network parameter theta i Copy to target network θ i′ Of middle, i.e. theta i →θ i′ Wherein i represents the drone serial number;
step 3-2: setting the size of an experience playback queue as M, a discount factor as gamma, the maximum number of rounds E, the maximum number of steps T of each round and the size of an experience extraction N batch Setting the number e of rounds as 0;
step 3-3: initializing n states s of an unmanned aerial vehicle for enclosure 1 ,…,s n Updating the current time t to be 0;
step 3-4: generating a random number z, for each drone i, performing the action:
Figure FDA0003644212800000035
wherein epsilon greedy Is a greedy coefficient to be used for the image display,
Figure FDA0003644212800000041
outputting the corresponding action for the maximum Q value of the main network;
step 3-5: set of actions a of execution 1 ,…,a n Calculating a prize value r 1 ,…,r n Update status to s' 1 ,…,s′ n And respectively calculate the priority p 1 ,…,p n Stored together in an experience playback queue;
step 3-6: based on
Figure FDA0003644212800000042
Collecting N batch A sample base where j denotes the serial number of the extracted empirical sample, p j The priority is expressed, and the parameter alpha is used for adjusting the priority sampling degree of the samples;
calculating importanceWeight coefficient w of sample j
w j =(M·P(i)) /max i w i
Beta is a hyper-parameter and is used for adjusting the influence calculation of importance sampling on a PER algorithm and a model convergence rate;
calculating the time difference error of the current moment:
Figure FDA0003644212800000043
wherein,
Figure FDA0003644212800000044
representing the reward obtained by drone i at time t + 1;
calculating the target value to obtain the target value
Figure FDA0003644212800000045
Figure FDA0003644212800000046
Wherein, gamma is reward discount factor, j is sample number, theta i′ A target network representing an ith agent;
combining importance weights w j Updating the parameter L (theta) of the current network based on the minimization loss function i ):
Figure FDA0003644212800000047
Step 3-7: respectively updating the target network parameters of each unmanned aerial vehicle agent:
θ i′ ←τθ i +(1-τ)θ i′
τ represents an update scale factor;
step 3-8: and (4) adding 1 to the update step length t, and executing judgment: when T is less than T and the multi-unmanned aerial vehicle enclosure judgment condition is not met, entering the step 3-4; otherwise, entering the step 3-9;
step 3-9: the number of update rounds e is added to 1, and judgment is performed: if E < E, updating to step 3-3; otherwise, finishing the training and entering the step 3-10;
step 3-10: terminating the PER-IDQN network training process and storing the current network parameters; and loading the stored parameters into a multi-unmanned-aerial-vehicle trapping system, inputting state information into a neural network by each unmanned aerial vehicle at each moment, fitting the neural network by using the trained PER-IDQN, outputting the flight action of the unmanned aerial vehicles, and realizing the trapping of the target by each trapping unmanned aerial vehicle through cooperative decision.
CN202210525303.XA 2022-05-15 2022-05-15 PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method Pending CN114815891A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210525303.XA CN114815891A (en) 2022-05-15 2022-05-15 PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210525303.XA CN114815891A (en) 2022-05-15 2022-05-15 PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method

Publications (1)

Publication Number Publication Date
CN114815891A true CN114815891A (en) 2022-07-29

Family

ID=82514417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210525303.XA Pending CN114815891A (en) 2022-05-15 2022-05-15 PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method

Country Status (1)

Country Link
CN (1) CN114815891A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166034A (en) * 2023-04-25 2023-05-26 清华大学 Cross-domain collaborative trapping method, device and system
CN116337086A (en) * 2023-05-29 2023-06-27 中国人民解放军海军工程大学 Method, system, medium and terminal for calculating optimal capturing position of unmanned aerial vehicle network capturing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116166034A (en) * 2023-04-25 2023-05-26 清华大学 Cross-domain collaborative trapping method, device and system
CN116337086A (en) * 2023-05-29 2023-06-27 中国人民解放军海军工程大学 Method, system, medium and terminal for calculating optimal capturing position of unmanned aerial vehicle network capturing
CN116337086B (en) * 2023-05-29 2023-08-04 中国人民解放军海军工程大学 Method, system, medium and terminal for calculating optimal capturing position of unmanned aerial vehicle network capturing

Similar Documents

Publication Publication Date Title
CN113495578B (en) Digital twin training-based cluster track planning reinforcement learning method
CN108731684B (en) Multi-unmanned aerial vehicle cooperative area monitoring airway planning method
CN113095481B (en) Air combat maneuver method based on parallel self-game
CN113110592A (en) Unmanned aerial vehicle obstacle avoidance and path planning method
CN114815891A (en) PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method
CN112131786A (en) Target detection and distribution method and device based on multi-agent reinforcement learning
CN113791634A (en) Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN112198892B (en) Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method
CN113848974B (en) Aircraft trajectory planning method and system based on deep reinforcement learning
CN114330115B (en) Neural network air combat maneuver decision-making method based on particle swarm search
CN116661503B (en) Cluster track automatic planning method based on multi-agent safety reinforcement learning
CN113268081B (en) Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning
CN113625569B (en) Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN113893539A (en) Cooperative fighting method and device for intelligent agent
CN115185294B (en) QMIX-based aviation soldier multi-formation collaborative autonomous behavior decision modeling method
CN115097861B (en) Multi-unmanned aerial vehicle trapping strategy method based on CEL-MADDPG
CN117150757A (en) Simulation deduction system based on digital twin
CN115981369A (en) Method for joint task allocation and flight path planning of multiple unmanned aerial vehicles under limited communication
CN117313561B (en) Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method
CN117908565A (en) Unmanned aerial vehicle safety path planning method based on maximum entropy multi-agent reinforcement learning
CN117574950A (en) Multi-agent self-organizing collaborative trapping method in non-convex environment
CN115903885B (en) Unmanned aerial vehicle flight control method of swarm Agent model based on task traction
CN113110101A (en) Production line mobile robot gathering type recovery and warehousing simulation method and system
CN114089751A (en) Mobile robot path planning method based on improved DDPG algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination