CN114815891A - PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method - Google Patents
PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method Download PDFInfo
- Publication number
- CN114815891A CN114815891A CN202210525303.XA CN202210525303A CN114815891A CN 114815891 A CN114815891 A CN 114815891A CN 202210525303 A CN202210525303 A CN 202210525303A CN 114815891 A CN114815891 A CN 114815891A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- enclosure
- target
- idqn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 25
- 230000009471 action Effects 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000005070 sampling Methods 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 8
- 230000000875 corresponding effect Effects 0.000 claims description 7
- 230000002093 peripheral effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000013459 approach Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 abstract description 3
- 230000003993 interaction Effects 0.000 abstract description 2
- 238000003062 neural network model Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 7
- 230000002787 reinforcement Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000007123 defense Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000005755 formation reaction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/60—Intended control result
- G05D1/656—Interaction with payloads or external entities
- G05D1/683—Intercepting moving targets
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/20—Control system inputs
- G05D1/24—Arrangements for determining position or orientation
- G05D1/246—Arrangements for determining position or orientation using environment maps, e.g. simultaneous localisation and mapping [SLAM]
- G05D1/2464—Arrangements for determining position or orientation using environment maps, e.g. simultaneous localisation and mapping [SLAM] using an occupancy grid
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/40—Control within particular dimensions
- G05D1/43—Control of position or course in two dimensions
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/60—Intended control result
- G05D1/69—Coordinated control of the position or course of two or more vehicles
- G05D1/698—Control allocation
- G05D1/6983—Control allocation by distributed or sequential control
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2101/00—Details of software or hardware architectures used for the control of position
- G05D2101/10—Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques
- G05D2101/15—Details of software or hardware architectures used for the control of position using artificial intelligence [AI] techniques using machine learning, e.g. neural networks
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2105/00—Specific applications of the controlled vehicles
- G05D2105/35—Specific applications of the controlled vehicles for combat
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D2109/00—Types of controlled vehicles
- G05D2109/20—Aircraft, e.g. drones
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention provides a PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method, which comprises the steps of modeling a grid digital map and an unmanned aerial vehicle motion model, deploying a multi-unmanned aerial vehicle neural network model by adopting a depth Q network algorithm through interaction of each unmanned aerial vehicle and the environment, optimizing the algorithm model by utilizing a priority experience playback strategy, then establishing a state space, an action space and a reward function to carry out targeted design on the multi-unmanned aerial vehicle enclosure capture tactical model, and finally establishing the multi-unmanned aerial vehicle enclosure capture tactical model which can be used for establishing effective enclosure capture tactics in a complex obstacle environment to realize enclosure capture of maneuvering targets. The method can realize the enclosure of the maneuvering target, effectively improves the sampling efficiency of experience samples, solves the problem of low training speed of the unmanned aerial vehicle decision model in a complex task scene, and is suitable for the enclosure and autonomous obstacle avoidance tasks of the unmanned aerial vehicles in a complex dynamic environment, and the finally constructed multi-unmanned aerial vehicle enclosure tactical model has higher stability.
Description
Technical Field
The invention relates to the field of multi-agent systems and unmanned aerial vehicle intelligent decision making, in particular to a multi-unmanned aerial vehicle enclosure tactical method.
Background
The unmanned aerial vehicle has the characteristics of strong concealment, high safety and the like, and provides a new mode idea for meeting the requirements of multi-machine cooperation, low casualty rate and the like required by modern informatization defense tactics. In the scene that enemy invades our air to take the air and carries out illegal information reconnaissance, adopt many defense unmanned aerial vehicles to constitute many unmanned aerial vehicle formations, let many unmanned aerial vehicle formations can be according to the situation environment and carry out the enclosure expulsion or the accompanying of target automatically and keep watch on, have important meaning.
The existing research on multi-unmanned aerial vehicle enclosure capture tactics is small, the position of a target is solved in real time mainly by adopting an artificial intelligence method, and then a corresponding tracking path is planned to realize the approach and capture of the target. The patent publication CN112241173A proposes an intelligent planning method for multi-agent aggregation points based on artificial potential field, which converts the target into virtual aggregation points, and then calculates the repulsive force between agents and the repulsive force between agent and obstacle by using an artificial potential field model, and calculates the position and path information of the virtual aggregation points of the agents. However, the method does not consider the problem of large calculation amount brought by model calculation in a dynamic environment, and cannot guarantee the real-time performance of multi-agent decision making. In recent years, the development of deep reinforcement learning technology provides a new idea for real-time online intelligent decision making of an unmanned system. The patent publication CN113625775A provides a multi-unmanned-plane enclosure capturing method combining state prediction and DDPG, which predicts the states of unmanned planes based on a least square method, trains an unmanned plane model by adopting a deep reinforcement learning DDPG algorithm, and finally deploys the unmanned plane model into a multi-unmanned-plane system to realize multi-unmanned-plane enclosure capturing decision. However, when the method is used for training the unmanned aerial vehicle decision model, the training sample data size is large, the types of all variables are complex, the training efficiency is low, and the finally obtained multi-unmanned aerial vehicle trapping model is poor in stability and has certain limitations.
The priority experience playback strategy is a deep reinforcement learning optimization method, the use rate of experience samples with high priority is improved by calculating the importance of each experience sample and sequencing the priority, and the training speed of an intelligent agent is finally improved. Therefore, how to introduce the prior experience playback strategy into the multi-agent deep reinforcement learning method, and combine the prior experience playback strategy with a complex multi-unmanned-aerial-vehicle enclosure tactical model to improve the autonomous behavior of each unmanned aerial vehicle, and finally realize enclosure capture of the target through cooperative decision, which becomes a difficult problem of deep reinforcement learning applied in the multi-unmanned-aerial-vehicle intelligent decision field.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-unmanned aerial vehicle capture tactical method based on PER-IDQN. The invention relates to a multi-unmanned aerial vehicle encirclement tactical method based on a priority empirical playback strategy Independent Deep Q learning Network (PER-IDQN). Specifically, a grid digital map, an unmanned aerial vehicle motion model and the like are modeled, a Deep Q Network (DQN) algorithm is adopted to deploy a multi-unmanned aerial vehicle neural Network model through interaction between each unmanned aerial vehicle and the environment, a priority empirical Replay strategy (PER) is utilized to optimize the algorithm model, then a state space, an action space and a reward function are constructed to carry out targeted design on the multi-unmanned aerial vehicle enclosure tactical model, and the finally constructed multi-unmanned aerial vehicle enclosure tactical model can be used for making an effective enclosure tactical plan under a complex obstacle environment, so that enclosure capture of a maneuvering target is realized.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: constructing a grid digital map model and an unmanned aerial vehicle model;
step 2: constructing a multi-unmanned aerial vehicle enclosure decision model based on a PER-IDQN algorithm;
and step 3: constructing a multi-unmanned-plane enclosure capture decision model based on a PER-IDQN algorithm and training; and each unmanned aerial vehicle inputs the state information into the neural network respectively, the PER-IDQN neural network obtained through training is fitted, the flight action of the unmanned aerial vehicle is output, and each unmanned aerial vehicle for enclosure realizes the enclosure of the target through cooperative decision.
The steps of constructing the grid digital map model and the unmanned aerial vehicle model are as follows:
step 1-1: in order to conveniently quantify the specific position of the unmanned aerial vehicle, the whole airspace range is uniformly divided into a plurality of grids, each grid is set to be a square with the length of l kilometers, the task scene is a kilometer, b kilometer, and the total width l of the task scene is width A.l kilometers, total length l length B x l km;
step 1-2: setting the speed of the capture unmanned aerial vehicle to be l kilometer/time step, and setting the speed of the target unmanned aerial vehicle to be n x l kilometer/time step;
step 1-3: setting the size of the action space of the unmanned aerial vehicle to be 4, namely the unmanned aerial vehicle can only move in four directions, namely up, down, left and right directions in each step;
step 1-4: the detectable range of each unmanned aerial vehicle is set to be a circular area with l kilometers as the radius, namely, the detectable range is approximate to a peripheral Sudoku area with the unmanned aerial vehicle as the center in a grid scene.
The step 2 of constructing a multi-unmanned aerial vehicle trapping decision model based on a PER-IDQN algorithm comprises the following steps:
step 2-1: the action space A of the capture unmanned aerial vehicle is set as follows:
A=[(0,-l),(0,l),(-l,0),(0,l)]
(0, -l), (0, l), (-l,0), (0, l) represent 4 actions of the drone moving down, up, left, and right, l represents the length of each grid;
step 2-2: setting the state space S of the capture unmanned aerial vehicle as follows:
S=[S uav ,S teamer ,S obser ,S target ,S finish ]
wherein S uav ,S teamer ,S obser ,S target ,S finish Respectively representing state information of the unmanned aerial vehicle, information of other friend unmanned aerial vehicles, unmanned aerial vehicle detection information, target information and task state information;
specifically, for the ith unmanned aerial vehicle in the multi-enclosure unmanned aerial vehicle system, the state information of the ith unmanned aerial vehicle is set as follows:
x i and y i Coordinate information representing an ith drone;
setting acquirable friend-side unmanned aerial vehicle state information for ith unmanned aerial vehicleComprises the following steps:
wherein x is i And y i Respectively representing the horizontal and vertical coordinate values of the ith unmanned aerial vehicle, and n represents the number of the unmanned aerial vehicles;
o m the detection readings represent the exploration information of the surrounding unmanned aerial vehicle on the peripheral Sudoku positions;
in addition, in combination with the relative distance and orientation information of the target with respect to the i-th host unmanned aerial vehicle, for the i-th host unmanned aerial vehicle, acquirable target information is setComprises the following steps:
wherein, d i And theta i Respectively representing the distance and the relative azimuth angle x between the unmanned aerial vehicle for capturing and the target by one party e And y e The abscissa and ordinate values of the escape target are represented;
in addition, in order to help the encirclement unmanned aerial vehicle to effectively complete the encirclement on the target, for the ith encirclement unmanned aerial vehicle, a sub-state quantity is set
Indicating whether the target is completely captured or not;
step 2-3: considering three decision processes of maneuvering approach, cooperative enclosure and autonomous obstacle avoidance to a target in the multi-unmanned aerial vehicle enclosure tactics, for each individual enclosure unmanned aerial vehicle, setting a reward function R as follows:
R=σ 1 r pos +σ 2 r safe +σ 3 r effi +σ 4 r task
wherein r is pos ,r safe ,r effi ,r task Respectively representing position reward, safe flight reward, high-efficiency flight reward and task completion reward sigma 1 ~σ 4 The corresponding weight value for each reward;
specifically, the position sub-reward is set as follows:
r pos =(|x e -x i |+|y e -y i |)-(|x e -x i |+|y e -y i |)′
wherein, (| x) e -x i |+|y e -y i I) and (| x) e -x i |+|y e -y i |)' respectively represents the distance between the unmanned aerial vehicle and the target at the current moment and the next moment;
the safe flyer reward of the capture unmanned aerial vehicle is set as follows:
the high-efficiency flyer reward of the capture unmanned aerial vehicle is set as follows:
r effi =-n stay
n stay representing the number of times of the unmanned aerial vehicle for enclosure to stay at the current grid position;
setting a task completion sub-reward item of the capture unmanned aerial vehicle as follows:
step 2-4: the setting of the multiple unmanned aerial vehicle enclosure judgment conditions is as follows: and when the target is away from each enclosure unmanned aerial vehicle by a unit grid distance, the target is regarded as being incapable of escaping, and the enclosure task is completed.
The step 3: constructing a multi-unmanned-plane enclosure capture decision model based on a PER-IDQN algorithm and training;
step 3-1: respectively constructing a main BP neural network hidden layer theta in a PER-IDQN algorithm for each captive unmanned aerial vehicle i And state-behavior value functionWherein the value function input parameterRespectively for the state and behavior of the unmanned aerial vehicle i at the moment t, and respectively calculating the main network parameter theta i Copy to target network θ i′ Of middle, i.e. theta i →θ i′ Wherein i represents the drone serial number;
step 3-2: setting the size of an experience playback queue as M, a discount factor as gamma, the maximum number of rounds E, the maximum number of steps T of each round and the size of an experience extraction N batch Setting the number e of rounds as 0;
step 3-3: initializing n states s of an unmanned aerial vehicle for enclosure 1 ,…,s n Updating the current time t to be 0;
step 3-4: generating a random number z, for each drone i, performing the action:
wherein epsilon greedy Is a greedy coefficient of the color space,outputting the corresponding action for the maximum Q value of the main network;
step 3-5: set of actions a of execution 1 ,…,a n Calculating a prize value r 1 ,…,r n Update status to s' 1 ,…,s′ n And respectively calculate the priority p 1 ,…,p n Stored in the experience playback queue together;
step 3-6: based onCollecting N batch A sample base, where j represents the serial number of the extracted empirical sample, p j The priority is expressed, and the parameter alpha is used for adjusting the priority sampling degree of the samples;
calculating a weight coefficient w for an importance sample j :
w j =(M·P(i)) -β /max i w i
Beta is a hyper-parameter and is used for adjusting the influence calculation of importance sampling on the PER algorithm and the model convergence rate;
calculating the time difference error of the current moment:
Wherein gamma is the reward discount factor, j is the sample number, theta i′ A target network representing an ith agent;
combining importance weights w j Updating the parameter L (theta) of the current network based on the minimization loss function i ):
Step 3-7: respectively updating the target network parameters of each unmanned aerial vehicle agent:
θ i′ ←τθ i +(1-τ)θ i′
τ represents an update scale factor;
step 3-8: and (4) adding 1 to the update step length t, and executing judgment: when T is less than T and the multi-unmanned aerial vehicle enclosure judgment condition is not met, entering the step 3-4; otherwise, entering the step 3-9;
step 3-9: the number of update rounds e is added to 1, and judgment is performed: if E < E, updating to step 3-3; otherwise, finishing the training and entering the step 3-10;
step 3-10: terminating the PER-IDQN network training process and storing the current network parameters; and loading the stored parameters into a multi-unmanned-aerial-vehicle trapping system, inputting state information into a neural network by each unmanned aerial vehicle at each moment, fitting the neural network by using the trained PER-IDQN, outputting the flight action of the unmanned aerial vehicles, and realizing the trapping of the target by each trapping unmanned aerial vehicle through cooperative decision.
The beneficial effect of the invention lies in that the proposed multi-unmanned aerial vehicle surrounding and capturing tactical method based on PER-IDQN has the advantages that:
(1) the constructed multi-unmanned aerial vehicle enclosure-arrest decision system does not need to be independently specified for the tactics of each unmanned aerial vehicle, the tactics and task cooperation can be completed through environment sensing and information sharing among the unmanned aerial vehicles, and the finally formulated multi-unmanned aerial vehicle enclosure-arrest tactics can achieve enclosure-arrest of maneuvering targets.
(2) According to the method, a prior experience replay PER strategy is introduced into the IDQN algorithm, so that the sampling efficiency of experience samples is effectively improved, and the problem of low training rate of the decision model of the unmanned aerial vehicle in a complex task scene is solved. The finally constructed multi-unmanned aerial vehicle enclosure tactical model is stronger in stability and can be suitable for multi-unmanned aerial vehicle enclosure and autonomous obstacle avoidance tasks in complex dynamic environments.
Drawings
Fig. 1 is a schematic view of unmanned aerial vehicle detection.
Fig. 2 is a schematic diagram of a position relationship between the unmanned aerial vehicle for enclosure and the target.
Fig. 3 is a schematic diagram of training of a multi-unmanned aerial vehicle capture model based on PER-IDQN.
Fig. 4 is a schematic diagram of a multi-drone enclosure capture.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention provides a PER-IDQN-based multi-unmanned aerial vehicle containment tactical method, and the whole flow is shown in figure 3. The technical solution is further clearly and completely described below with reference to the accompanying drawings and specific embodiments:
step 1: constructing a grid digital map model and an unmanned aerial vehicle model;
step 1-1: in order to conveniently quantify the specific position of the unmanned aerial vehicle, the whole airspace range is divided into a plurality of grids, the length of each grid is set to be 0.1 kilometer, and the total width and the total length of a task scene are respectively set to be l width 8 km and l length 4 km ═ 4 km;
step 1-2: setting the speed of the capture unmanned aerial vehicle to be 0.1 kilometer per time step, and setting the speed of the target unmanned aerial vehicle to be 0.2 kilometer per time step;
step 1-3: setting the size of the action space of the unmanned aerial vehicle to be 4, namely the unmanned aerial vehicle can only move in four directions, namely up, down, left and right directions in each step;
step 1-4: setting the detectable range of each unmanned aerial vehicle as a circular region with the radius of 0.1 kilometer, namely, approximately as a peripheral Sudoku region with the unmanned aerial vehicle as the center in a grid scene;
step 2: constructing a multi-unmanned aerial vehicle enclosure decision model based on a PER-IDQN algorithm;
step 2-1: setting an action space A of the capture unmanned aerial vehicle as follows:
A=[(0,-l),(0,l),(-l,0),(0,l)]
(0, -l), (0, l), (-l,0), (0, l) represents 4 actions of the drone moving down, up, left, and right, l represents the length of each grid;
step 2-2: setting the state space S of the capture unmanned aerial vehicle as follows:
S=[S uav ,S teamer ,S obser ,S target ,S finish ]
wherein S uav ,S teamer ,S obser ,S target ,S finish Respectively representing the state information of the unmanned aerial vehicle, the information of other friend-side unmanned aerial vehicles, the detection information of the unmanned aerial vehicle, target information and task state information;
specifically, for the ith unmanned aerial vehicle in the multi-enclosure unmanned aerial vehicle system, the state information of the ith unmanned aerial vehicle is set as follows:
x i and y i Coordinate information representing an ith drone;
setting acquirable friend unmanned aerial vehicle state information for ith unmanned aerial vehicleComprises the following steps:
wherein x is i And y i Respectively representing the horizontal and vertical coordinate values of the ith unmanned aerial vehicle, and n represents the number of the unmanned aerial vehicles;
o m the detection readings represent the exploration information of the surrounding unmanned aerial vehicle on the peripheral Sudoku positions; the unmanned aerial vehicle detection information is shown in fig. 1;
in addition, in combination with the relative distance and orientation information of the target with respect to the i-th host drone, for the i-th host drone, the target information that can be acquired by the i-th host drone is setComprises the following steps:
wherein d is i And theta i Respectively represents the distance and the relative azimuth angle x between the unmanned plane and the target e And y e Representing the horizontal and vertical coordinate values of the escape target; the position relation between the capture unmanned aerial vehicle and the target is shown in fig. 2;
in addition, in order to help the encirclement unmanned aerial vehicle to effectively complete the encirclement on the target, for the ith encirclement unmanned aerial vehicle, a sub-state quantity is set
Indicating whether the target is completely captured or not;
step 2-3: considering decision processes such as maneuvering approach, cooperative capture, autonomous obstacle avoidance and the like of a target in multi-unmanned aerial vehicle capture tactics, for each individual capture unmanned aerial vehicle, a reward function R is set as follows:
R=σ 1 r pos +σ 2 r safe +σ 3 r effi +σ 4 r task
wherein r is pos ,r safe ,r effi ,r task Respectively representing position reward, safe flight reward, high-efficiency flight reward, task completion reward, sigma 1~4 Awarding corresponding weight values for all items;
specifically, the position sub-reward is set as follows:
r pos =(|x e -x i |+|y e -y i |)-(|x e -x i |+|y e -y i |)′
wherein, (| x) e -x i |+|y e -y i I) and (| x) e -x i |+|y e -y i |)' respectively represents the distance between the unmanned aerial vehicle and the target at the current moment and the next moment;
the safe flyer reward of the capture unmanned aerial vehicle is set as follows:
the high-efficiency flyer reward of the capture unmanned aerial vehicle is set as follows:
r effi =-n stay
n stay representing the number of times of the unmanned aerial vehicle for enclosure to stay at the current grid position;
setting a task completion sub-reward item of the capture unmanned aerial vehicle as follows:
step 2-4: setting a plurality of unmanned aerial vehicle enclosure judgment conditions: when the target is away from each enclosure unmanned aerial vehicle by a unit grid distance, the target is regarded as being unable to escape, and the enclosure task is completed;
and step 3: constructing a multi-unmanned aerial vehicle capture decision model and training the model based on a deep reinforcement learning PER-IDQN algorithm;
step 3-1: respectively constructing a main BP neural network hidden layer theta in a PER-IDQN algorithm for each captive unmanned aerial vehicle i And state-behavior value functionWherein the value function input parameterRespectively for the state and behavior of the unmanned aerial vehicle i at the moment t, and respectively calculating the main network parameter theta i Copy to target network θ i′ Of middle, i.e. theta i →θ i′ Wherein i represents the drone serial number;
step 3-2: setting the size M of an experience playback queue, a discount factor gamma, the maximum number of rounds E, the maximum number of steps T of each round and the size N of an experience extraction batch Setting the number e of rounds as 0;
step 3-3: initializing n states s of an unmanned aerial vehicle for enclosure 1 ,…,s n Updating the current time t to be 0;
step 3-4: generating a random number z, for each drone i, performing the action:
wherein epsilon greedy Is a greedy coefficient to be used for the image display,outputting the corresponding action for the maximum Q value of the main network;
step 3-5: set of actions a of execution 1 ,…,a n Calculating a prize value r 1 ,…,r n Update status to s' 1 ,…,s′ n And respectively calculate the priority p 1 ,…,p n Stored in the experience playback queue together;
step 3-6: based onCollecting N batch A sample base where j denotes the serial number of the extracted empirical sample, p j The priority is expressed, and the parameter alpha is used for adjusting the priority sampling degree of the samples;
calculating a weight coefficient w for an importance sample j :
w j =(M·P(i)) -β /max i w i
M is the size of the empirical playback queue, beta is a hyperparameter and is used for adjusting the influence calculation of importance sampling on the PER algorithm and the model convergence rate;
calculating the time difference error of the current moment:
calculating the target value to obtain the target value Y t i :
Wherein, gamma is reward discount factor, j is sample number, theta i′ A target network representing an ith agent;
combining importance weights w j Updating the parameter L (theta) of the current network based on the minimization loss function i ):
Step 3-7: respectively updating the target network parameters of each unmanned aerial vehicle agent:
θ i′ ←τθ i +(1-τ)θ i′
τ represents an update scale factor;
step 3-8: and (3) updating the step length t to t +1, and performing judgment: when T is less than T and does not meet the multi-unmanned aerial vehicle enclosure judgment condition shown in the step 2-4, entering the step 3-4; otherwise, entering the step 3-9;
step 3-9: update round number e ═ e +1, decision is performed: if E < E, updating to step 3-3; otherwise, finishing the training and entering the step 3-10;
step 3-10: terminating the PER-IDQN network training process and storing the current network parameters; and loading the stored parameters into a multi-unmanned aerial vehicle trapping system. At each moment, each unmanned aerial vehicle inputs state information into the neural network respectively, the PER-IDQN neural network obtained through training is fitted, the flight action of the unmanned aerial vehicles is output, and the target is captured by each capturing unmanned aerial vehicle through cooperative decision.
In order to better illustrate the superiority of the method of the present invention, the present embodiment was tested in different scenes. Specifically, in a task scenario in which the number of grids is 80 × 40, the obstacle mobility is kept at 10%, different obstacle coverage rates are set and the test is performed, and the test results are shown in table 1.
Table 1 multi-unmanned plane enclosure performance under different environmental barrier coverage
As can be seen from the above table, as the coverage rate of the environmental obstacles increases, the multi-unmanned aerial vehicle enclosure time increases; when the coverage rate of the obstacles is increased to 0.10 or above, compared with the traditional IDQN algorithm, the average simulation step length of the multi-unmanned aerial vehicle enclosure capture tactics based on the PER-IDQN algorithm is less, which means that the enclosure capture tactics formulated when the multi-unmanned aerial vehicle of the PER-IDQN algorithm faces to a complex obstacle environment are more effective, and the enclosure capture of the targets can be realized in a shorter time. A simulation diagram of the multi-unmanned plane enclosure capture is shown in FIG. 4.
In summary, the multi-unmanned aerial vehicle enclosure capture tactical method based on PER-IDQN provided by the invention adopts off-line learning to train the neural network, stores data generated during training in the experience pool, provides learning samples for optimization of the neural network, and combines the maneuvering control and cooperative enclosure capture task requirements of the multi-unmanned aerial vehicle to design the action and state of the unmanned aerial vehicle, thereby realizing the intelligent decision control of the multi-unmanned aerial vehicle.
The multi-unmanned aerial vehicle enclosure tactical method provided by the invention has high model training efficiency, and the constructed multi-unmanned aerial vehicle enclosure tactical model can be suitable for being used in a complex dynamic scene, so that the multi-unmanned aerial vehicle enclosure tactical execution efficiency is improved.
The above description is only a preferred embodiment of the present invention, and it should be noted that: the embodiments of the present invention are not limited to the above-described implementation methods; it will be apparent to those skilled in the art that other variations and modifications can be made without departing from the spirit of the invention. It should be understood that any equivalent substitutions, modifications and improvements made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (4)
1. A multi-unmanned aerial vehicle enclosure tactical method based on PER-IDQN is characterized by comprising the following steps:
step 1: constructing a grid digital map model and an unmanned aerial vehicle model;
step 2: constructing a multi-unmanned aerial vehicle enclosure decision model based on a PER-IDQN algorithm;
and step 3: constructing a multi-unmanned-plane enclosure capture decision model based on a PER-IDQN algorithm and training; and each unmanned aerial vehicle inputs the state information into the neural network respectively, the PER-IDQN neural network obtained through training is fitted, the flight action of the unmanned aerial vehicle is output, and each unmanned aerial vehicle for enclosure realizes the enclosure of the target through cooperative decision.
2. The PER-IDQN-based multi-drone containment tactical method of claim 1, wherein:
the steps of constructing the grid digital map model and the unmanned aerial vehicle model are as follows:
step 1-1: in order to conveniently quantify the specific position of the unmanned aerial vehicle, the whole airspace range is uniformly divided into a plurality of grids, each grid is set to be a square with the length of l kilometers, the task scene is a kilometer, b kilometer, and the total width l of the task scene is width A.l kilometers, total length l length B x l km;
step 1-2: setting the speed of the capture unmanned aerial vehicle to be l kilometer/time step, and setting the speed of the target unmanned aerial vehicle to be n x l kilometer/time step;
step 1-3: setting the size of the action space of the unmanned aerial vehicle to be 4, namely the unmanned aerial vehicle can only move in four directions, namely up, down, left and right directions in each step;
step 1-4: the detectable range of each unmanned aerial vehicle is set to be a circular area with l kilometers as the radius, namely, the detectable range is approximate to a peripheral Sudoku area with the unmanned aerial vehicle as the center in a grid scene.
3. The PER-IDQN-based multi-drone containment tactical method of claim 1, wherein:
the step 2 of constructing a multi-unmanned aerial vehicle trapping decision model based on a PER-IDQN algorithm comprises the following steps:
step 2-1: setting an action space A of the capture unmanned aerial vehicle as follows:
A=[(0,-l),(0,l),(-l,0),(0,l)]
(0, -l), (0, l), (-l,0), (0, l) represents 4 actions of the drone moving down, up, left, and right, l represents the length of each grid;
step 2-2: setting the state space S of the capture unmanned aerial vehicle as follows:
S=[S uav ,S teamer ,S obser ,S target ,S finish ]
wherein S uav ,S teamer ,S obser ,S target ,S finish Respectively representing state information of the unmanned aerial vehicle, information of other friend unmanned aerial vehicles, unmanned aerial vehicle detection information, target information and task state information;
specifically, for the ith unmanned aerial vehicle in the multi-enclosure unmanned aerial vehicle system, the state information of the ith unmanned aerial vehicle is set as follows:
x i and y i Coordinate information representing an ith drone;
setting acquirable friend-side unmanned aerial vehicle state information for ith unmanned aerial vehicleComprises the following steps:
wherein x is i And y i Respectively representing the horizontal and vertical coordinate values of the ith unmanned aerial vehicle, and n represents the number of the unmanned aerial vehicles;
o m the detection readings represent the exploration information of the surrounding unmanned aerial vehicle on the peripheral Sudoku positions;
in addition, are combinedRelative distance and direction information of the target relative to the unmanned aerial vehicle i of the same party, and acquirable target information is set for the ith enclosing unmanned aerial vehicleComprises the following steps:
wherein d is i And theta i Respectively representing the distance and the relative azimuth angle x between the unmanned aerial vehicle for capturing and the target by one party e And y e The abscissa and ordinate values of the escape target are represented;
in addition, in order to help the encirclement unmanned aerial vehicle to effectively complete the encirclement on the target, for the ith encirclement unmanned aerial vehicle, a sub-state quantity is set
Indicating whether the target is completely captured or not;
step 2-3: considering three decision processes of maneuvering approach, cooperative enclosure and autonomous obstacle avoidance to a target in the multi-unmanned aerial vehicle enclosure tactics, for each individual enclosure unmanned aerial vehicle, setting a reward function R as follows:
R=σ 1 r pos +σ 2 r safe +σ 3 r effi +σ 4 r task
wherein r is pos ,r safe ,r effi ,r task Respectively representing position reward, safe flight reward, high-efficiency flight reward and task completion reward sigma 1 ~σ 4 The corresponding weight value for each reward;
specifically, the position sub-reward is set as follows:
r pos =(|x e -x i |+|y e -y i |)-(|x e -x i |+|y e -y i |)′
wherein, (| x) e -x i |+|y e -y i I) and (| x) e -x i |+|y e -y i |)' respectively represents the distance between the unmanned aerial vehicle and the target at the current moment and the next moment;
the safe flyer reward of the capture unmanned aerial vehicle is set as follows:
the high-efficiency flyer reward of the capture unmanned aerial vehicle is set as follows:
r effi =-n stay
n stay representing the number of times of the unmanned aerial vehicle for enclosure to stay at the current grid position;
setting a task completion sub-reward item of the capture unmanned aerial vehicle as follows:
step 2-4: the setting of the multiple unmanned aerial vehicle enclosure judgment conditions is as follows: and when the target is away from each enclosure unmanned aerial vehicle by a unit grid distance, the target is regarded as being incapable of escaping, and the enclosure task is completed.
4. The PER-IDQN-based multi-drone containment tactical method of claim 1, wherein:
the step 3: constructing a multi-unmanned-plane enclosure capture decision model based on a PER-IDQN algorithm and training;
step 3-1: respectively constructing a main BP neural network hidden layer theta in a PER-IDQN algorithm for each captive unmanned aerial vehicle i And state-behavior value functionWherein the value function input parameterRespectively for the state and behavior of the unmanned aerial vehicle i at the moment t, and respectively calculating the main network parameter theta i Copy to target network θ i′ Of middle, i.e. theta i →θ i′ Wherein i represents the drone serial number;
step 3-2: setting the size of an experience playback queue as M, a discount factor as gamma, the maximum number of rounds E, the maximum number of steps T of each round and the size of an experience extraction N batch Setting the number e of rounds as 0;
step 3-3: initializing n states s of an unmanned aerial vehicle for enclosure 1 ,…,s n Updating the current time t to be 0;
step 3-4: generating a random number z, for each drone i, performing the action:
wherein epsilon greedy Is a greedy coefficient to be used for the image display,outputting the corresponding action for the maximum Q value of the main network;
step 3-5: set of actions a of execution 1 ,…,a n Calculating a prize value r 1 ,…,r n Update status to s' 1 ,…,s′ n And respectively calculate the priority p 1 ,…,p n Stored together in an experience playback queue;
step 3-6: based onCollecting N batch A sample base where j denotes the serial number of the extracted empirical sample, p j The priority is expressed, and the parameter alpha is used for adjusting the priority sampling degree of the samples;
calculating importanceWeight coefficient w of sample j :
w j =(M·P(i)) -β /max i w i
Beta is a hyper-parameter and is used for adjusting the influence calculation of importance sampling on a PER algorithm and a model convergence rate;
calculating the time difference error of the current moment:
Wherein, gamma is reward discount factor, j is sample number, theta i′ A target network representing an ith agent;
combining importance weights w j Updating the parameter L (theta) of the current network based on the minimization loss function i ):
Step 3-7: respectively updating the target network parameters of each unmanned aerial vehicle agent:
θ i′ ←τθ i +(1-τ)θ i′
τ represents an update scale factor;
step 3-8: and (4) adding 1 to the update step length t, and executing judgment: when T is less than T and the multi-unmanned aerial vehicle enclosure judgment condition is not met, entering the step 3-4; otherwise, entering the step 3-9;
step 3-9: the number of update rounds e is added to 1, and judgment is performed: if E < E, updating to step 3-3; otherwise, finishing the training and entering the step 3-10;
step 3-10: terminating the PER-IDQN network training process and storing the current network parameters; and loading the stored parameters into a multi-unmanned-aerial-vehicle trapping system, inputting state information into a neural network by each unmanned aerial vehicle at each moment, fitting the neural network by using the trained PER-IDQN, outputting the flight action of the unmanned aerial vehicles, and realizing the trapping of the target by each trapping unmanned aerial vehicle through cooperative decision.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210525303.XA CN114815891A (en) | 2022-05-15 | 2022-05-15 | PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210525303.XA CN114815891A (en) | 2022-05-15 | 2022-05-15 | PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114815891A true CN114815891A (en) | 2022-07-29 |
Family
ID=82514417
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210525303.XA Pending CN114815891A (en) | 2022-05-15 | 2022-05-15 | PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114815891A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116166034A (en) * | 2023-04-25 | 2023-05-26 | 清华大学 | Cross-domain collaborative trapping method, device and system |
CN116337086A (en) * | 2023-05-29 | 2023-06-27 | 中国人民解放军海军工程大学 | Method, system, medium and terminal for calculating optimal capturing position of unmanned aerial vehicle network capturing |
-
2022
- 2022-05-15 CN CN202210525303.XA patent/CN114815891A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116166034A (en) * | 2023-04-25 | 2023-05-26 | 清华大学 | Cross-domain collaborative trapping method, device and system |
CN116337086A (en) * | 2023-05-29 | 2023-06-27 | 中国人民解放军海军工程大学 | Method, system, medium and terminal for calculating optimal capturing position of unmanned aerial vehicle network capturing |
CN116337086B (en) * | 2023-05-29 | 2023-08-04 | 中国人民解放军海军工程大学 | Method, system, medium and terminal for calculating optimal capturing position of unmanned aerial vehicle network capturing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113495578B (en) | Digital twin training-based cluster track planning reinforcement learning method | |
CN108731684B (en) | Multi-unmanned aerial vehicle cooperative area monitoring airway planning method | |
CN113095481B (en) | Air combat maneuver method based on parallel self-game | |
CN113110592A (en) | Unmanned aerial vehicle obstacle avoidance and path planning method | |
CN114815891A (en) | PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method | |
CN112131786A (en) | Target detection and distribution method and device based on multi-agent reinforcement learning | |
CN113791634A (en) | Multi-aircraft air combat decision method based on multi-agent reinforcement learning | |
CN112198892B (en) | Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
CN114330115B (en) | Neural network air combat maneuver decision-making method based on particle swarm search | |
CN116661503B (en) | Cluster track automatic planning method based on multi-agent safety reinforcement learning | |
CN113268081B (en) | Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning | |
CN113625569B (en) | Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model | |
CN114089776B (en) | Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning | |
CN113893539A (en) | Cooperative fighting method and device for intelligent agent | |
CN115185294B (en) | QMIX-based aviation soldier multi-formation collaborative autonomous behavior decision modeling method | |
CN115097861B (en) | Multi-unmanned aerial vehicle trapping strategy method based on CEL-MADDPG | |
CN117150757A (en) | Simulation deduction system based on digital twin | |
CN115981369A (en) | Method for joint task allocation and flight path planning of multiple unmanned aerial vehicles under limited communication | |
CN117313561B (en) | Unmanned aerial vehicle intelligent decision model training method and unmanned aerial vehicle intelligent decision method | |
CN117908565A (en) | Unmanned aerial vehicle safety path planning method based on maximum entropy multi-agent reinforcement learning | |
CN117574950A (en) | Multi-agent self-organizing collaborative trapping method in non-convex environment | |
CN115903885B (en) | Unmanned aerial vehicle flight control method of swarm Agent model based on task traction | |
CN113110101A (en) | Production line mobile robot gathering type recovery and warehousing simulation method and system | |
CN114089751A (en) | Mobile robot path planning method based on improved DDPG algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |