CN112824998A - Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process - Google Patents

Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process Download PDF

Info

Publication number
CN112824998A
CN112824998A CN201911139552.XA CN201911139552A CN112824998A CN 112824998 A CN112824998 A CN 112824998A CN 201911139552 A CN201911139552 A CN 201911139552A CN 112824998 A CN112824998 A CN 112824998A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
state
drone
collaborative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911139552.XA
Other languages
Chinese (zh)
Inventor
刘蓉
肖颖峰
张衡
梁瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Changkong Technology Co ltd
Nanjing Pukou High-Tech Industrial Development Zone Management Committee
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing Changkong Technology Co ltd
Nanjing Pukou High-Tech Industrial Development Zone Management Committee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Changkong Technology Co ltd, Nanjing Pukou High-Tech Industrial Development Zone Management Committee, Nanjing University of Aeronautics and Astronautics filed Critical Nanjing Changkong Technology Co ltd
Priority to CN201911139552.XA priority Critical patent/CN112824998A/en
Publication of CN112824998A publication Critical patent/CN112824998A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/12Target-seeking control

Abstract

The invention discloses a multi-unmanned aerial vehicle collaborative route planning method and a device in a Markov decision process, wherein the method comprises the following steps: constructing a Markov process model under a multi-unmanned aerial vehicle collaborative route planning task according to a state space, an action space, a state transfer function and a pre-constructed reward function in the Markov decision process; and executing search strategy iteration based on a Markov process model based on a pre-constructed evaluation function, and searching an action sequence which enables the evaluation function value to be maximum, thereby planning the optimal multi-unmanned aerial vehicle collaborative air route. The invention also introduces radar threats into a reward function when the unmanned aerial vehicle runs, and reasonably designs the number of the operational environments and the state spaces of a plurality of unmanned aerial vehicles; reasonable and effective flight paths can be planned for multiple unmanned aerial vehicles rapidly, meanwhile, the radar threat cost of the multiple unmanned aerial vehicles on the air route is greatly reduced, and the safety of the unmanned aerial vehicles in task execution under the complex environment is improved.

Description

Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process
Technical Field
The invention relates to the technical field of unmanned aerial vehicle route planning, in particular to a multi-unmanned aerial vehicle collaborative route planning method and device in a Markov decision process.
Background
With the development of aviation technology, the cooperative combat by using multiple unmanned aerial vehicles in a complex and changeable environment is widely applied. The research of the unmanned aerial vehicle route planning method is developed, the burden and inconvenience of manually planning the route are reduced, meanwhile, the information such as the known terrain and threats can be fully utilized to complete the global route meeting the self-restraint and task requirements, and the technical guarantee is provided for realizing the low-altitude penetration and hidden flight of the unmanned aerial vehicle. Therefore, the unmanned aerial vehicle route planning method is a key component of an unmanned aerial vehicle system; the unmanned aerial vehicle is an important premise for realizing the autonomous flight of the unmanned aerial vehicle; the unmanned aerial vehicle is an important basis for ensuring that the unmanned aerial vehicle smoothly completes tasks and accurately hits enemy targets; the unmanned aerial vehicle is a powerful guarantee for realizing automatic control of the unmanned aerial vehicle. The development of the research of the air route planning can also improve the overall level of the current mission planning, and has important practical significance for the further research of the mission planning. The survival probability of the unmanned aerial vehicle is further improved by developing the research of the air route planning and the mission planning, a powerful basis is provided for determining the operational use value of the air route, and the unmanned aerial vehicle has strong engineering application value and practical significance for the development of the unmanned aerial vehicle in China. How to rapidly plan the flight path meeting the constraint condition is also the key to realize the autonomous planning of the unmanned aerial vehicle.
At present, research work for route planning at home and abroad mainly focuses on the aspect of route planning algorithm, and the route planning algorithm plays a decisive role in autonomous flight, accurate tracking or striking of the unmanned aerial vehicle, and is related to the efficiency of route planning and even the survival probability of the unmanned aerial vehicle. In the flight path planning, the unmanned aerial vehicle executes different tasks, and the adopted route planning algorithm is also different. When a simple investigation task is executed, only one global airway needs to be planned according to the obtained information, and the unmanned aerial vehicle only needs to load the global airway before taking off. When attack on enemies is carried out, dynamic threats of the enemies often appear, and at the moment, dynamic adjustment needs to be properly carried out on the basis of a global reference route so as to avoid the dynamic threats.
At present, the common multi-unmanned aerial vehicle collaborative route planning method at home and abroad comprises an ant colony algorithm, a genetic algorithm and A*Algorithms, and the like. The ant colony algorithm has the advantages of strong robustness, good information feedback capability and the like, but the convergence rate of the algorithm is low and the algorithm is easy to fall into local optimization. The genetic algorithm has strong robustness because the genetic algorithm does not depend on the characteristics of the model, but for a complex battlefield environment, the convergence speed of the algorithm is slow, so that the path searching time is long. A. the*The algorithm has the advantages of simple algorithm, easiness for engineers to first and the like, but the algorithm has large calculation amount and long planning time.
Disclosure of Invention
The method aims to overcome at least one defect of the multi-unmanned aerial vehicle collaborative route planning method in the prior art.
In one aspect, the invention provides a multi-unmanned aerial vehicle collaborative route planning method based on an improved Markov decision process, which comprises the following steps:
determining starting points, task target points and radar threat areas of the multiple unmanned aerial vehicles in a flight scene according to the collaborative route planning task of the multiple unmanned aerial vehicles;
determining all state spaces, action spaces and state transfer functions in the motion process of the multiple unmanned aerial vehicles according to the starting points, the task target points and the radar threat areas of the multiple unmanned aerial vehicles;
constructing a Markov decision process model under a multi-unmanned-aerial-vehicle collaborative route planning task according to a state space, an action space, a state transfer function and a pre-constructed reward function in the Markov decision process;
and executing search strategy iteration based on a Markov decision process model based on a pre-constructed evaluation function, and searching an action sequence which enables the evaluation function value to be maximum, thereby planning the optimal multi-unmanned aerial vehicle collaborative air route.
Before the unmanned aerial vehicle takes off, the planned optimal multi-unmanned aerial vehicle collaborative air path is loaded to execute the collaborative flight task.
According to the multi-unmanned-aerial-vehicle collaborative route planning method based on the improved Markov decision process, preferably, modeling is carried out on the flight environment of the unmanned aerial vehicle, the task environment is initialized, a grid method is adopted to carry out two-dimensional space modeling on the flight environment of the unmanned aerial vehicle, influences such as terrain obstacles and severe weather are ignored for simplifying the model, only enemy radar threats are considered, and the flight scene comprises starting points of the multi-unmanned aerial vehicle, task target points and radar threat areas.
The method for planning the collaborative route of the multiple unmanned aerial vehicles based on the improved Markov decision process preferably further comprises the steps of calculating a comprehensive navigation cost function value of the multiple unmanned aerial vehicles based on a pre-constructed comprehensive navigation cost function of the multiple unmanned aerial vehicles in the search strategy iteration process, and taking a path with the minimum comprehensive navigation cost function value of the multiple unmanned aerial vehicles as an optimal collaborative route of the multiple unmanned aerial vehicles;
the method comprises the following steps of presetting a multi-unmanned aerial vehicle comprehensive navigation cost function for evaluating performance indexes of a planned route, and specifically:
for a single unmanned aerial vehicle, the route cost mainly comprises fuel oil cost, threat cost and the like. For the multi-unmanned aerial vehicle collaborative route planning, the route cost not only considers the navigation cost of a single machine, but also meets the multi-machine collaborative navigation cost. The following cost equation is adopted to describe a comprehensive navigation cost calculation formula of the multiple unmanned aerial vehicles as follows:
Ji=W1Jl,i+W2Jr,i+W3Jt (1)
in the formula: w1、W2And W3Weights for fuel cost, threat cost and synergy cost, Jl,iRepresenting the fuel cost when the length of the path segment under the ith path segment is l, and relating to the flight range of the unmanned aerial vehicle; j. the design is a squarer,iRepresenting the threat cost when the radar threat under the ith navigation route section is r; j. the design is a squaretIt varies with the time of flight of the drone, at a synergistic cost. According to the calculation formula (1) of the comprehensive navigation cost function of the multiple unmanned aerial vehicles, the comprehensive navigation cost of the planned route can be calculated, and the comprehensive navigation is selected to be smallerThe air route is taken as the final air route planned by the algorithm, so that the safety of the unmanned aerial vehicle for executing tasks in a complex environment is ensured. In the method for collaborative route planning of multiple drones based on the improved markov decision process, preferably, the markov decision process Model (MDP) is used as the following four-tuple M ═ for the markov decision process model<S,A,P,R>To show that:
s represents a finite set of system states, including finite state points of the drone flight environment. An environment model of the drone is established in a two-dimensional coordinate system according to step S11, where different coordinate points in the environment model represent different states of the drone, and each state corresponds to an element in the set S of state spaces.
A represents the limited set of actions available to the drone. The flight of the unmanned aerial vehicle is in a continuous state in the actual flight process of the unmanned aerial vehicle, but in the route planning of multiple unmanned aerial vehicles, after the starting point and the target point of each unmanned aerial vehicle are set, the unmanned aerial vehicle is regarded as a particle in the route planning process. Since the flight environment state of the drone is established by the grid method, the drone is defined to have 8 executable actions, where a is 1, 2, 3, …, 8. The entire 360 is equally divided by these movements, the angle between two consecutive movements being 45.
P is a state transfer function indicating when the body is in state stWhen it is, perform action atE.g. A, and transition to state st+1The probability of (c). The state transition probabilities may change with the target state, threat conditions, etc. Given the current state of the drone and the execution of the action, the distribution of the state transition probabilities will largely determine the action selection of the drone at the next moment. The state transition probability can be expressed as:
P(s′|s,a)=P(st+1=s′|st=s,at=a) (2)
s′∈SP(s′|s,a)=1 (3)
wherein
Figure BDA0002280531600000041
Representing unmanned aerial vehiclesAn example of a state is that of a state,
Figure BDA0002280531600000042
representing instances of unmanned aerial vehicle actions, stIndicating the state of the drone at time t, atRepresenting the action selected by the drone at time t.
The unmanned aerial vehicle takes the safe arrival of the target point as a task target, so that when the unmanned aerial vehicle flies from an initial point to the target point, the movement direction of the unmanned aerial vehicle is guided by the direction of the target point. An included angle between a connecting line of the target point and the unmanned aerial vehicle and the x direction is defined as theta, and the unmanned aerial vehicle can be controlled to continuously adjust the action according to the position of the target point so as to move towards the target point. Dividing the 360-degree space around the target point into 8 position states at intervals of 45 degrees according to theta, and dispersing the 8 position states into a target point position space TstateThe dispersion rule is as follows:
Figure BDA0002280531600000051
when the position point of the target point is known, in order to control the unmanned aerial vehicle to move towards the direction of the target point, the executable actions of the unmanned aerial vehicle are limited, meanwhile, the unmanned aerial vehicle is determined to give actions towards the grid where the direction of the target point is located with a high probability, the unmanned aerial vehicle can enter the adjacent grid with a certain probability, and the probability is low. When the drone is within a certain position space of the target point, the drone will have 5 executable actions, and each action probability is different, so that the action output state is 5 × 8 ═ 40 for 8 position spaces. Those skilled in the art can plan the state transition probabilities of the executable actions of the drone according to the actual mission.
R is a reward function representing the immediate reward that can be achieved given the current state and action of the drone. In a markov model system, the reward function is a penalty or reward signal that is fed back by the environment after the UAV makes a motion policy and interacts with the environment. The model represents the quality of actions taken by the unmanned aerial vehicle in a certain state, and is also an important basis for guiding the unmanned aerial vehicle to make flight decisions and safely avoid obstacles. NeedleThe reward function is designed in advance for the problems of the safety and the tendency to a target point of the unmanned aerial vehicle in the route planning process, and a reward function model R with a model-free uniform structure is introducedmovegoalAnd Ravoidobstacle
Figure BDA0002280531600000052
Figure BDA0002280531600000053
RmovegoalFor a model of the reward function, R, during normal driving of the unmanned aerial vehicleavoidobstacleReporting a reward function model for the unmanned aerial vehicle when the unmanned aerial vehicle encounters a threat;
in the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle (7)。
in the collaborative air route planning of the multiple unmanned aerial vehicles, the unmanned aerial vehicles are threatened by radar all the time in the flight process, and although effective paths can be planned for the unmanned aerial vehicles based on the basic Markov model algorithm, the effective paths are still probably detected by the radar. Therefore, in order to further reduce the probability that the unmanned aerial vehicle is detected by the radar, a radar threat model R with a non-uniform structure is put forward and introduced in the reward functionthreat
Figure BDA0002280531600000061
Wherein R isthreatThe model is a radar threat reward function model when the unmanned aerial vehicle runs, and negative rewards are given to radar threats received when the unmanned aerial vehicle flies. L is the length of the navigation section after the unmanned aerial vehicle makes action decision, N is the number of radar threats, dk/4,iAnd k is 1, 2 and 3 is the distance between the k/4 point of the navigation section and the ith radar threat, and the unmanned aerial vehicle selects the acquired reward function R (s,a) is represented as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle+Rthreat (9)。
the invention combines radar threat cost and a Markov model, reasonably designs the Markov decision model, and provides a multi-unmanned-aerial-vehicle air route planning algorithm based on the improved Markov decision model. Under the complex multi-threat environment, the flight path planning is carried out for multiple unmanned aerial vehicles, so that reasonable and effective flight paths can be quickly planned for the multiple unmanned aerial vehicles, meanwhile, the threat cost and the comprehensive cost of the paths of the multiple unmanned aerial vehicles are greatly reduced, and the safety of the unmanned aerial vehicles in executing tasks under the complex environment is improved.
Preferably, the multi-unmanned aerial vehicle collaborative route planning based on the Markov decision model aims to plan the effective route of the unmanned aerial vehicle by interaction between the actions of the unmanned aerial vehicle and the flight environment and final decision generation. The unmanned aerial vehicle main part selects and executes the action a according to the current environment state s, so that the unmanned aerial vehicle state is transferred to s' from s, meanwhile, the reward R is obtained, and the circulation is repeated until the target state is finally reached. Namely, the cooperative route planning of multiple unmanned aerial vehicles is to find the optimal strategy pi*I.e. according to the current state of the drone, a search strategy is implemented, searching for the desired reward, i.e. the evaluation function Vπ(s) maximum sequence of actions.
Using an optimal strategy pi*Indicates that V exists for all states S ∈ S*(s)=maxπVπ(s), optimal strategy π*The corresponding evaluation function is called the optimal evaluation function V*(s). The generation process of the optimal strategy is called strategy iteration. Optimum strategy pi*Dynamic programming can be used to find the maximum reward V*(s). In the infinite stage discount model, the function V is evaluatedπ(s) can be described as:
Figure BDA0002280531600000071
where γ is a discount factor, γtFor the discount factor at time t, 0.9 is taken. RtThe reward function value corresponding to the time t. s is the state that unmanned aerial vehicle corresponds at the moment when t is 0, and s' is the state that unmanned aerial vehicle is located at next moment. Then the above equation can be rewritten in a recursive fashion as:
Vπ(s)=R(s,π(s))+γ∑s′∈SP(s′|s,a)Vπ(s') (11) the above equation defines a method for calculating an evaluation function corresponding to a policy, and defines a state-action value function Qπ(s, a) as an intermediate variable in solving the merit function. Given the initial state s and the current action a of the drone, the drone will turn to the next state s 'at the next moment with probability P (s' | s, a) and follow this rule in the future, the state-action value function Qπ(s, a) can be expressed as:
Qπ(s,a)=R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s′) (12)
and R (s, a) is the reward obtained when the unmanned aerial vehicle selects the action a in the state s.
At this time, the optimal strategy of MDP is pi*(s) can be expressed as:
π*(s)=arg maxa∈AQπ(s,a)=arg maxa∈A{R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s′)} (13)
accordingly, the optimum merit function V*(s) can be expressed as:
V*(s)=maxa∈A{R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s′)} (14)
on the other hand, the invention provides a multi-unmanned aerial vehicle collaborative route planning device based on an improved Markov decision process, which is characterized by comprising an unmanned aerial vehicle collaborative system modeling module, a Markov process model building module and a multi-unmanned aerial vehicle collaborative route planning module; the unmanned aerial vehicle collaborative system modeling module is used for determining starting points, task target points and radar threat areas of the multiple unmanned aerial vehicles in a flight scene according to the multiple unmanned aerial vehicle collaborative route planning task;
the Markov process model building module is used for determining all state spaces, action spaces and state transfer functions in the motion process of the multiple unmanned aerial vehicles according to the starting points, the task target points and the radar threat regions of the multiple unmanned aerial vehicles; constructing a Markov decision process model under a multi-unmanned-aerial-vehicle collaborative route planning task according to a state space, an action space, a state transfer function and a pre-constructed reward function in the Markov decision process;
the multi-unmanned aerial vehicle collaborative route planning module is used for executing search strategy iteration based on a Markov decision process model based on a pre-constructed evaluation function, and searching an action sequence enabling the evaluation function value to be maximum, so that an optimal multi-unmanned aerial vehicle collaborative route is planned.
Further, the pre-constructed reward function R comprises R when the unmanned aerial vehicle normally runsmovegoalAnd R when the unmanned aerial vehicle meets threatavoidobstacleExpressed as follows:
Figure BDA0002280531600000081
Figure BDA0002280531600000082
Rmovegoalfor a model of the reward function, R, during normal driving of the unmanned aerial vehicleavoidobstacleReporting a reward function model for the unmanned aerial vehicle when the unmanned aerial vehicle encounters a threat;
in the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle
still further, the reward function also comprises a radar threat reward function R when the unmanned aerial vehicle runsthreatExpressed as follows:
Figure BDA0002280531600000091
wherein R isthreatA radar threat reward function for unmanned aerial vehicle running; l is the length of the navigation section after the unmanned aerial vehicle makes action decision, N is the number of radar threats, dk/4,iK is 1, 2, 3 is the distance between k/4 point of the navigation section and the ith radar threat;
in the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle+Rthreat
the invention achieves the following beneficial technical effects:
the invention provides a multi-unmanned aerial vehicle route planning algorithm based on an improved Markov decision process model, aiming at the problem that a multi-unmanned aerial vehicle is easily affected by environmental threats when executing combat tasks in a complex environment. According to the method, starting points, task target points and radar threat areas of the multiple unmanned aerial vehicles in a flight scene are determined according to the collaborative route planning task of the multiple unmanned aerial vehicles; determining all state spaces, action spaces and state transfer functions in the motion process of the multiple unmanned aerial vehicles according to the starting points, the task target points and the radar threat areas of the multiple unmanned aerial vehicles; constructing a reward function, and constructing a Markov process model under the collaborative route planning task of the multiple unmanned aerial vehicles according to a state space, an action space, a state transfer function and the reward function in the Markov decision process; and constructing an evaluation function, executing search strategy iteration based on Markov decision based on the evaluation function, and searching an action sequence which enables the evaluation function value to be maximum, thereby planning the optimal multi-unmanned aerial vehicle collaborative air route.
The invention also introduces the radar threat into the reward function when the unmanned aerial vehicle runs, and the algorithm reasonably designs the combat environment and the state space number of the multiple unmanned aerial vehicles by using the discretized radar threat information; discretizing the target point position space, and further reasonably distributing state transition probability;
combining radar threat and Markov decision process models, providing and introducing a radar threat model with a non-uniform structure on the basis of a reward function without a model uniform structure, and establishing an improved Markov decision process model;
according to the invention, by setting the multi-unmanned aerial vehicle comprehensive navigation cost function, the multi-unmanned aerial vehicle comprehensive navigation cost function value is calculated after searching the action sequence which enables the evaluation function value to be maximum, and the path with the minimum multi-unmanned aerial vehicle comprehensive navigation cost function value is taken as the optimal unmanned aerial vehicle collaborative airway, so that not only can a reasonable and effective flight path be planned for the multi-unmanned aerial vehicle rapidly, but also the threat cost and the airway comprehensive cost of the multi-unmanned aerial vehicle airway are greatly reduced, and the safety of the unmanned aerial vehicle in executing tasks in a complex environment is improved.
Drawings
FIG. 1 is an algorithmic flow chart of a method of route planning in accordance with an embodiment of the present invention;
FIG. 2 is a diagram of a multi-UAV environment model according to an embodiment of the present invention;
FIG. 3 is a diagram of basic operation of a drone in accordance with an embodiment of the present invention;
FIG. 4 is a schematic illustration of a position state of an embodiment of the present invention;
FIG. 5 is a diagram of simulation results in a simple environment according to an embodiment of the present invention;
FIG. 6 is a diagram of simulation results in a simple environment according to an embodiment of the present invention, wherein (a) is a diagram of single-target simulation results in a complex environment; (b) the method is a multi-target simulation result diagram in a complex environment.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As shown in fig. 1, a method for collaborative multi-drone collaborative route planning based on an improved markov decision process includes the following steps:
step S1, constructing the flight environment of the unmanned aerial vehicle according to the nature of the multi-unmanned aerial vehicle collaborative route planning task, establishing a route evaluation system, and determining the starting points, the task target points and the radar threat areas of the multi-unmanned aerial vehicle in the flight scene; the method comprises the following steps:
modeling the flight environment of the unmanned aerial vehicle, initializing a task environment, and performing two-dimensional space modeling on the flight environment of the unmanned aerial vehicle by adopting a grid method, wherein the size of each grid is 5km, in order to simplify the model, influence of terrain obstacles, severe weather and the like is ignored, only enemy radar threat is considered, a flight scene comprises starting points, task target points and radar threat areas of a plurality of unmanned aerial vehicles, and the multi-unmanned aerial vehicle environment model is shown in FIG. 2;
step S2, determining all state spaces, action spaces and state transfer functions in the motion process of the multiple unmanned aerial vehicles according to the starting points, the task target points and the radar threat areas of the multiple unmanned aerial vehicles;
constructing a Markov decision process model under a multi-unmanned-aerial-vehicle collaborative route planning task according to a state space, an action space, a state transfer function and a pre-constructed reward function in the Markov decision process;
and executing search strategy iteration based on a Markov decision process model based on a pre-constructed evaluation function, and searching an action sequence which enables the evaluation function value to be maximum, thereby planning the optimal multi-unmanned aerial vehicle collaborative air route.
The method specifically comprises the following steps;
and step S21, combining the radar threat cost with a Markov decision process model, and respectively constructing the model aiming at a state space, an action space, a state transfer function and a reward function in the Markov decision process.
The Markov decision process Model (MDP) is represented by the following quadruple M ═ S, A, P, R >:
s represents a finite set of system states, including finite state points of the drone flight environment. An environment model of the drone is established in a two-dimensional coordinate system according to step S11, where different coordinate points in the environment model represent different states of the drone, and each state corresponds to an element in the set S of state spaces.
A represents the limited set of actions available to the drone. The flight of the unmanned aerial vehicle is in a continuous state in the actual flight process of the unmanned aerial vehicle, but in the route planning of multiple unmanned aerial vehicles, after the starting point and the target point of each unmanned aerial vehicle are set, the unmanned aerial vehicle is regarded as a particle in the route planning process. Since the flight environment state of the drone is established by the grid method, the drone is defined to have 8 executable actions, where a is 1, 2, 3, …, 8. The entire 360 ° is divided equally by these actions, the angle between two adjacent actions is 45 °, and the basic action diagram of the drone is shown in fig. 3.
P is a state transfer function indicating when the body is in state stWhen it is, perform action atE.g. A, and transition to state st+1The probability of (c). The state transition probabilities may change with the target state, threat conditions, etc. Given the current state of the drone and the execution of the action, the distribution of the state transition probabilities will largely determine the action selection of the drone at the next moment. The state transition probability can be expressed as:
P(s′|s,a)=P(st+1=s′|st=s,at=a) (2)
S′∈SP(s′|s,a)=1 (3)
wherein
Figure BDA0002280531600000122
An example of the state of the drone is shown,
Figure BDA0002280531600000123
representing instances of unmanned aerial vehicle actions, stIndicating the state of the drone at time t, atRepresenting the action selected by the drone at time t.
The unmanned aerial vehicle takes the safe arrival of the target point as a task target, so that when the unmanned aerial vehicle flies from an initial point to the target point, the movement direction of the unmanned aerial vehicle is guided by the direction of the target point. An included angle between a connecting line of the target point and the unmanned aerial vehicle and the x direction is defined as theta, and the unmanned aerial vehicle can be controlled to continuously adjust the action according to the position of the target point so as to move towards the target point. The space of 360 ° around the target point may be divided at intervals of 45 ° according to θ, and dispersed into 8 position states. The state diagram is as shown in FIG. 4Target point position space TstateThe dispersion rule is as follows:
Figure BDA0002280531600000121
when the position point of the target point is known, in order to control the unmanned aerial vehicle to move towards the direction of the target point, the executable actions of the unmanned aerial vehicle are limited, meanwhile, the unmanned aerial vehicle is determined to give actions towards the grid where the direction of the target point is located with a high probability, the unmanned aerial vehicle can enter the adjacent grid with a certain probability, and the probability is low. When the drone is in a certain position space of the target point, the drone will have 5 executable actions, and each action probability is different, and the action output state is 5 × 8 ═ 40 for 8 position spaces (the basic action diagram of the drone is shown in fig. 3, and the schematic diagram of the position state is shown in fig. 4). The partial state transition probability design of the executable action of the unmanned aerial vehicle in the embodiment is shown in table 1.
TABLE 1 partial State transition probability design
Figure BDA0002280531600000131
R is a reward function representing the immediate reward that can be achieved given the current state and action of the drone. In a markov model system, the reward function is a penalty or reward signal that is fed back by the environment after the UAV makes a motion policy and interacts with the environment. The model represents the quality of actions taken by the unmanned aerial vehicle in a certain state, and is also an important basis for guiding the unmanned aerial vehicle to make flight decisions and safely avoid obstacles. Designing a reward function aiming at the problems of safety and tendency to a target point of the unmanned aerial vehicle in the route planning process, and constructing a reward function model R with a model-free uniform structure in advancemovegoalAnd Ravoidobstacle
Figure BDA0002280531600000141
Figure BDA0002280531600000142
RmovegoalFor a model of the reward function, R, during normal driving of the unmanned aerial vehicleavoidobstacleAnd reporting a reward function model when the unmanned aerial vehicle encounters a threat.
In the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle (7)。
in the collaborative air route planning of the multiple unmanned aerial vehicles, the unmanned aerial vehicles are threatened by radar all the time in the flight process, and although effective paths can be planned for the unmanned aerial vehicles based on the basic Markov model algorithm, the effective paths are still probably detected by the radar. Therefore, in order to further reduce the probability that the unmanned aerial vehicle is detected by the radar, a radar threat model R with a non-uniform structure is put forward and introduced in the reward functionthreat
Figure BDA0002280531600000143
Wherein R isthreatThe model is a radar threat reward function model when the unmanned aerial vehicle runs, and negative rewards are given to radar threats received when the unmanned aerial vehicle flies. L is the length of the navigation path section after the unmanned aerial vehicle makes action decision, N is the number of radar threats, dk/4,iAnd k is 1, 2 and 3 is the distance between the k/4 point of the navigation road section and the ith radar threat. After the current state s and the execution action a of the unmanned aerial vehicle are given, R can be determined through formulas (5) to (6) according to the distance change conditions between the unmanned aerial vehicle and the target point and between the unmanned aerial vehicle and the obstaclemovegoalAnd RavoidobstacleR can be obtained through a formula (8) according to the distance relation between the unmanned aerial vehicle and the radarthreat. And the unmanned plane selects the obtained reward of the action a as follows in the state s:
R(s,a)=Rmovegoal+Ravoidobstacle+Rthreat (9)
step S22, a search strategy is executed based on a pre-constructed evaluation function for the established markov decision process model, and an action sequence that maximizes the evaluation function is searched for.
The multi-unmanned aerial vehicle route planning based on the Markov decision model aims to plan the effective route of the unmanned aerial vehicle by means of interaction between the actions of the unmanned aerial vehicle and the flight environment and final decision generation. The unmanned aerial vehicle main part selects and executes the action a according to the current environment state s, so that the unmanned aerial vehicle state is transferred to s' from s, meanwhile, the reward R is obtained, and the circulation is repeated until the target state is finally reached. Namely, the cooperative route planning of multiple unmanned aerial vehicles is to find the optimal strategy pi*I.e. according to the current state of the drone, a search strategy is implemented, searching for the desired reward, i.e. the evaluation function Vπ(s) maximum sequence of actions.
Using an optimal strategy pi*Indicates that V exists for all states S ∈ S*(s)=maxπVπ(s), optimal strategy π*The corresponding evaluation function is called the optimal evaluation function V*(s)。
The generation process of the optimal strategy is called strategy iteration. Optimum strategy pi*Dynamic programming can be used to find the maximum reward V*(s). In the infinite stage discount model, the function V is evaluatedπ(s) can be described as:
Figure BDA0002280531600000151
where γ is a discount factor, γtFor time t discount factors, gamma and gammat0.9 is taken. RtIs the corresponding reward value at time t. s is the state that unmanned aerial vehicle corresponds at the moment when t is 0, and s' is the state that unmanned aerial vehicle is located at next moment. Then the above equation can be rewritten in a recursive fashion as:
Vπ(s)=R(s,π(s))+γ∑s′∈SP(s′|s,a)Vπ(s′) (11)
the above formula gives a method for calculating an evaluation function corresponding to a strategy, and defines a state-action value function Qπ(s, a) as an intermediate variable in solving the merit function. Given the initial state s and the current action a of the drone, the drone will turn to the next state s 'at the next moment with probability P (s' | s, a) and follow this rule in the future, the state-action value function Qπ(s, a) can be expressed as:
Qπ(s,a)=R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s′) (12)
and R (s, a) is the reward obtained by the unmanned aerial vehicle selecting the action a under the state s.
At this time, the optimal strategy of MDP is pi*(s) can be expressed as:
π*(s)=arg maxa∈AQπ(s,a)=arg maxa∈A{R(s,a)+γ∑s′∈S P(s′|s,a)Vπ(s′)} (13)
accordingly, the optimum merit function V*(s) can be expressed as:
V*(s)=maxa∈A{R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s′)} (14)
on the basis of the embodiment, a multi-unmanned aerial vehicle comprehensive navigation cost function is further set for evaluating performance indexes of a planned route, a multi-unmanned aerial vehicle comprehensive navigation cost function value is calculated in a search strategy iteration process, and a path with the minimum multi-unmanned aerial vehicle comprehensive navigation cost function value is used as an optimal unmanned aerial vehicle collaborative route;
for a single unmanned aerial vehicle, the route cost mainly comprises fuel oil cost, threat cost and the like. For the multi-unmanned aerial vehicle collaborative route planning, the route cost not only considers the navigation cost of a single machine, but also meets the multi-machine collaborative navigation cost. The following cost equation is adopted to describe a comprehensive navigation cost calculation formula of the multiple unmanned aerial vehicles as follows:
Ji=W1Jl,i+W2Jr,i+W3Jt (1)
in the formula: w1、W2And W3Weights for fuel cost, threat cost and synergy cost, Jr,iRepresenting fuel cost and relating to the flight range of the unmanned aerial vehicle; j. the design is a squarer,iAt a threat cost; j. the design is a squaretIt varies with the time of flight of the drone, at a synergistic cost.
Simulation analysis:
FIG. 5 is a diagram of simulation results in a simple environment according to an embodiment of the present invention; as shown in fig. 5, in a simple environment, in order to obtain a Matlab simulation result diagram of a basic ant colony algorithm and a basic MDP (Markov Decision process) model planning algorithm, the related parameters are initialized: the flying points are set to be (4,40), (60,5), (90,90), the target points are set to be (50,50), the radar threat radiuses are unified to be 2, the unit is kilometer, the radar threat number is 64, and the table 2 is experimental data of a basic ant colony algorithm and a basic MDP model planning algorithm in a simple environment; as can be seen from fig. 5, both algorithms can plan a feasible path, where the solid line represents the feasible path planned by the basic MDP model planning algorithm, and the dotted line represents the feasible path planned by the basic ant colony algorithm; as can be seen from table 2, in the same target environment, the feasible path can be planned for the unmanned aerial vehicle by using the basic MDP model planning algorithm compared with the basic ant colony algorithm, but the path planning time of the basic MDP model planning algorithm is shorter, and the planned route has smaller threat cost and route comprehensive cost.
TABLE 2 basic ant colony Algorithm and basic MDP model planning Algorithm Experimental data
Experimental methods Planning time/ms Cost of threat Composite cost
Basic ant colony algorithm 352 470 324
Basic MDP model algorithm 223 358 263
FIG. 6 is a diagram of simulation results in a simple environment according to an embodiment of the present invention, wherein (a) is a diagram of single-target simulation results in a complex environment; (b) the method is a multi-target simulation result diagram in a complex environment. As shown in fig. 6, in order to obtain a Matlab simulation result diagram by using a basic MDP model and an improved MDP model algorithm (here, the improved MDP model algorithm is a specific embodiment of a radar threat model with a non-uniform structure proposed and introduced in a reward function in the present invention) for a single-target task and a multi-target task respectively in a complex environment, relevant parameters are initialized: the single-target mission departure points are set to be (4,40), (60,5), (90,90), the target points are set to be (50,50), the multi-target mission departure points are set to be (10,10), (60,5), (85,5), the target points are set to be (20,80), (50,85), (70,85), the radar threat radii of the target points are uncertain, the number of radar threats is 64, and table 3 shows experimental data of a basic MDP model algorithm and an improved MDP model planning algorithm in a complex environment; from table 3, it can be seen that the improved MDP model algorithm plans a reasonable and effective flight path for multiple unmanned aerial vehicles, and simultaneously greatly reduces the route threat cost and the comprehensive cost, and improves the safety of the unmanned aerial vehicle in executing tasks in a complex environment.
TABLE 3 basic MDP model Algorithm and improved MDP model planning Algorithm Experimental data
Figure BDA0002280531600000171
Example (b): a multi-unmanned aerial vehicle collaborative route planning device based on an improved Markov decision process comprises an unmanned aerial vehicle collaborative system modeling module, a Markov process model building module and a multi-unmanned aerial vehicle collaborative route planning module; the unmanned aerial vehicle collaborative system modeling module is used for determining starting points, task target points and radar threat areas of the multiple unmanned aerial vehicles in a flight scene according to the multiple unmanned aerial vehicle collaborative route planning task;
the Markov process model building module is used for determining all state spaces, action spaces and state transfer functions in the motion process of the multiple unmanned aerial vehicles according to the starting points, the task target points and the radar threat regions of the multiple unmanned aerial vehicles; constructing a reward function, and constructing a Markov process model under the collaborative route planning task of the multiple unmanned aerial vehicles according to a state space, an action space, a state transfer function and the reward function in the Markov decision process;
the multi-unmanned aerial vehicle collaborative route planning module is used for constructing an evaluation function, executing search strategy iteration based on Markov decision based on the evaluation function, and searching an action sequence enabling the evaluation function value to be maximum, so that an optimal multi-unmanned aerial vehicle collaborative route is planned.
The Markov decision process model constructed by the Markov process model construction module is represented by a Markov decision process Model (MDP) with a quadruple M ═ S, A, P, R >:
s represents a finite set of system states, including finite state points of the drone flight environment. An environment model of the drone is established in a two-dimensional coordinate system according to step S11, where different coordinate points in the environment model represent different states of the drone, and each state corresponds to an element in the set S of state spaces.
A represents the limited set of actions available to the drone. The flight of the unmanned aerial vehicle is in a continuous state in the actual flight process of the unmanned aerial vehicle, but in the route planning of multiple unmanned aerial vehicles, after the starting point and the target point of each unmanned aerial vehicle are set, the unmanned aerial vehicle is regarded as a particle in the route planning process. Since the flight environment state of the drone is established by the grid method, the drone is defined to have 8 executable actions, where a is 1, 2, 3, …, 8. The entire 360 is equally divided by these movements, the angle between two consecutive movements being 45.
P is a state transfer function indicating when the body is in state stWhen it is, perform action atE.g. A, and transition to state st+1The probability of (c). The state transition probabilities may change with the target state, threat conditions, etc. Given the current state of the drone and the execution of the action, the distribution of the state transition probabilities will largely determine the action selection of the drone at the next moment. The state transition probability can be expressed as:
P(s′|s,a)=P(st+1=s′|st=s,at=a) (2)
s′∈SP(s′|s,a)=1 (3)
wherein
Figure BDA0002280531600000192
An example of the state of the drone is shown,
Figure BDA0002280531600000193
representing instances of unmanned aerial vehicle actions, stIndicating the state of the drone at time t, atRepresenting the action selected by the drone at time t.
The unmanned aerial vehicle takes the safe arrival of the target point as a task target, so that when the unmanned aerial vehicle flies from an initial point to the target point, the movement direction of the unmanned aerial vehicle is guided by the direction of the target point. An included angle between a connecting line of the target point and the unmanned aerial vehicle and the x direction is defined as theta, and the unmanned aerial vehicle can be controlled to continuously adjust the action according to the position of the target point so as to move towards the target point. Dividing the 360-degree space around the target point into 8 position states at intervals of 45 degrees according to theta, and dispersing the 8 position states into a target point position space TstateThe dispersion rule is as follows:
Figure BDA0002280531600000191
when the position point of the target point is known, in order to control the unmanned aerial vehicle to move towards the direction of the target point, the executable actions of the unmanned aerial vehicle are limited, meanwhile, the unmanned aerial vehicle is determined to give actions towards the grid where the direction of the target point is located with a high probability, the unmanned aerial vehicle can enter the adjacent grid with a certain probability, and the probability is low. When the drone is within a certain position space of the target point, the drone will have 5 executable actions, and each action probability is different, so that the action output state is 5 × 8 ═ 40 for 8 position spaces. The partial state transition probability design of the executable action of the unmanned aerial vehicle is shown in table 1.
The markov process model building module includes a reward function building module, R being a reward function representing the immediate reward that can be obtained given the current state and action of the drone. In a markov model system, the reward function is a penalty or reward signal that is fed back by the environment after the UAV makes a motion policy and interacts with the environment. The model represents the quality of actions taken by the unmanned aerial vehicle in a certain state, and is also an important basis for guiding the unmanned aerial vehicle to make flight decisions and safely avoid obstacles. The reward function is designed aiming at the problems of safety and tendency to a target point of the unmanned aerial vehicle in the route planning process, and a reward function model R with a model-free uniform structure is introduced into a reward function building modulemovegoalAnd Ravoidobstacle
Figure BDA0002280531600000201
Figure BDA0002280531600000202
RmovegoalFor a model of the reward function, R, during normal driving of the unmanned aerial vehicleavoidobstacleReporting a reward function model for the unmanned aerial vehicle when the unmanned aerial vehicle encounters a threat;
in the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle (7)。
in the collaborative air route planning of the multiple unmanned aerial vehicles, the unmanned aerial vehicles are threatened by radar all the time in the flight process, and although effective paths can be planned for the unmanned aerial vehicles based on the basic Markov model algorithm, the effective paths are still probably detected by the radar. Therefore, in order to further reduce the probability that the unmanned aerial vehicle is detected by the radar, the reward function building module provides and introduces the radar threat model R with the non-uniform structure in the reward functionthreat
Figure BDA0002280531600000203
Wherein R isthreatThe model is a radar threat reward function model when the unmanned aerial vehicle runs, and negative rewards are given to radar threats received when the unmanned aerial vehicle flies. L is the length of the navigation section after the unmanned aerial vehicle makes action decision, N is the number of radar threats, dk/4,iAnd k is 1, 2 and 3 is the distance between the k/4 point of the navigation section and the ith radar threat, and the obtained reward function R (s, a) of the action a is selected by the unmanned aerial vehicle in the state s and is represented as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle+Rthreat (9)。
and the multi-unmanned aerial vehicle collaborative route planning module is used for planning the effective route of the unmanned aerial vehicle by interacting the unmanned aerial vehicle action with the flight environment and finally generating a decision based on the Markov decision model. The unmanned aerial vehicle main part selects and executes the action a according to the current environment state s, so that the unmanned aerial vehicle state is transferred to s' from s, meanwhile, the reward R is obtained, and the circulation is repeated until the target state is finally reached. Namely, the cooperative route planning of multiple unmanned aerial vehicles is to find the optimal strategy pi*I.e. according to the current state of the drone, a search strategy is implemented, searching for the desired reward, i.e. the evaluation function Vπ(s) maximum sequence of actions.
Using an optimal strategy pi*For all shapesThe state S belongs to S and has V*(s)=maxπVπ(s), optimal strategy π*The corresponding evaluation function is called the optimal evaluation function V*(s). The generation process of the optimal strategy is called strategy iteration. Optimum strategy pi*Dynamic programming can be used to find the maximum reward V*(s). In the infinite stage discount model, the function V is evaluatedπ(s) can be described as:
Figure BDA0002280531600000211
where γ is a discount factor, γtFor the discount factor at time t, 0.9 is taken. RtThe value of the reward function corresponding to the t moment; s is the state that unmanned aerial vehicle corresponds at the moment when t is 0, and s' is the state that unmanned aerial vehicle is located at next moment. Then the above equation can be rewritten in a recursive fashion as:
Vπ(s)=R(s,π(s))+γ∑s′∈SP(s′|s,a)Vπ(s′) (11)
the above formula gives a method for calculating an evaluation function corresponding to a strategy, and defines a state-action value function Qπ(s, a) as an intermediate variable in solving the merit function. Given the initial state s of the drone and the current action a, the drone will turn to the next state s 'at the next instant with probability P (s' | s, a) and follow this rule in the future, then the state-action value function Q pi (s, a) can be expressed as:
Qπ(s,a)=R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s') (12) wherein R (s, a) is the reward obtained when the drone picks action a in state s.
At this time, the optimal strategy of MDP is pi*(s) can be expressed as:
π*(s)=arg maxa∈AQπ(s,a)=arg maxa∈A{R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s′)} (13)
accordingly, the optimum merit functionV*(s) can be expressed as:
V*(s)=maxa∈A{R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s′)}
(14)
the invention provides a multi-unmanned aerial vehicle route planning algorithm based on an improved Markov decision process model, aiming at the characteristics of the multi-unmanned aerial vehicle collaborative route planning problem and the problem that the multi-unmanned aerial vehicle is easily affected by environmental threats when the multi-unmanned aerial vehicle executes combat missions in a complex environment. According to the method, starting points, task target points and radar threat areas of the multiple unmanned aerial vehicles in a flight scene are determined according to the collaborative route planning task of the multiple unmanned aerial vehicles; determining all state spaces, action spaces and state transfer functions in the motion process of the multiple unmanned aerial vehicles according to the starting points, the task target points and the radar threat areas of the multiple unmanned aerial vehicles; constructing a reward function, and constructing a Markov process model under the collaborative route planning task of the multiple unmanned aerial vehicles according to a state space, an action space, a state transfer function and the reward function in the Markov decision process; and constructing an evaluation function, executing search strategy iteration based on Markov decision based on the evaluation function, and searching an action sequence which enables the evaluation function value to be maximum, thereby planning the optimal multi-unmanned aerial vehicle collaborative air route.
In order to further reduce the probability of the unmanned aerial vehicle being detected by the radar, a radar threat model with a non-uniform structure is provided and introduced in a reward function, the radar threat cost is combined with a Markov model, the Markov decision model is reasonably designed, and a multi-unmanned aerial vehicle route planning algorithm based on the improved Markov decision model is provided. Under the complex multi-threat environment, flight route planning is carried out on multiple unmanned aerial vehicles, and simulation results show that the multi-unmanned aerial vehicle route planning based on the improved Markov decision model can not only quickly plan reasonable and effective flight paths for the multiple unmanned aerial vehicles, but also greatly reduce the radar threat cost and route comprehensive cost of the multiple unmanned aerial vehicles, and improve the safety of the unmanned aerial vehicles in executing tasks under the complex environment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A multi-unmanned aerial vehicle collaborative route planning method based on an improved Markov decision process is characterized by comprising the following steps:
determining starting points, task target points and radar threat areas of the multiple unmanned aerial vehicles in a flight scene according to the collaborative route planning task of the multiple unmanned aerial vehicles;
determining all state spaces, action spaces and state transfer functions in the motion process of the multiple unmanned aerial vehicles according to the starting points, the task target points and the radar threat areas of the multiple unmanned aerial vehicles;
constructing a Markov decision process model under a multi-unmanned-aerial-vehicle collaborative route planning task according to a state space, an action space, a state transfer function and a pre-constructed reward function in the Markov decision process;
and executing search strategy iteration based on a Markov decision process model based on a pre-constructed evaluation function, and searching an action sequence which enables the evaluation function value to be maximum, thereby planning the optimal multi-unmanned aerial vehicle collaborative air route.
2. The method for planning the collaborative navigation route of multiple unmanned aerial vehicles based on the improved Markov decision process according to claim 1, wherein the method further comprises the steps of calculating a comprehensive navigation cost function value of the multiple unmanned aerial vehicles based on a pre-constructed comprehensive navigation cost function of the multiple unmanned aerial vehicles in the iterative process of the search strategy, and taking the path with the minimum comprehensive navigation cost function value of the multiple unmanned aerial vehicles as the optimal collaborative navigation route of the multiple unmanned aerial vehicles;
the multi-unmanned aerial vehicle comprehensive navigation cost function calculation formula is as follows:
Ji=W1Jl,i+W2Jr,i+W3Jt
in the formula: j. the design is a squareiFor the comprehensive navigation cost of multiple unmanned planes under the ith navigation route section, W1、W2And W3Weights for fuel cost, threat cost and synergy cost, Jl,iRepresenting the fuel cost when the length of the path section under the ith path section is l; j. the design is a squarer,iRepresenting the threat cost when the radar threat under the ith navigation route section is r; j. the design is a squaretAt a synergistic cost.
3. The method for collaborative multi-drone roadrouting based on improved markov decision process according to claim 1, wherein the markov decision process model is represented by the following four-tuple M ═ S, a, P, R >:
s represents a finite set of system states, including finite state points of the unmanned aerial vehicle flight environment;
a represents a limited set of actions available to the drone;
p is a state transition probability function, representing when the body is in state stWhen it is, perform action atE.g. A, and transition to state st+1The probability of (d);
r is a reward function representing the immediate reward that can be achieved given the current state and action of the drone.
4. The collaborative multi-drone collaborative routeing method based on the improved markov decision process according to claim 3, wherein the state transition probability function is represented as:
P(s′|s,a)=P(st+1=s′|st=s,at=a)
Figure FDA0002280531590000021
wherein
Figure FDA0002280531590000022
To indicate nobodyAn example of the state of the machine,
Figure FDA0002280531590000023
representing instances of unmanned aerial vehicle actions, stIndicating the state of the drone at time t, atRepresenting the action selected by the drone at time t.
5. The collaborative multi-drone collaborative route planning method based on improved markov decision process of claim 1, wherein the reward function R includes reward function R during normal drone drivingmovegoalAnd a reward function R when the unmanned aerial vehicle encounters a threatavoidobstacleExpressed as follows:
Figure FDA0002280531590000031
Figure FDA0002280531590000032
in the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle
6. the collaborative multi-drone collaborative route planning method based on improved markov decision process of claim 5, wherein the reward function further comprises a radar threat reward function R while drones are drivingthreatExpressed as follows:
Figure FDA0002280531590000033
wherein L is the length of the navigation section after the unmanned aerial vehicle makes action decision, N is the number of radar threats, dk/4,iK is 1, 2, 3 is the k/4 point and the first point of the navigation sectionThe distance between i radar threats;
in the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle+Rthreat
7. the collaborative multi-drone collaborative route planning method based on improved markov decision process according to claim 1, wherein the pre-constructed merit function Vπ(s) is expressed as:
V*(s)=maxa∈A{R(s,a)+γ∑s′∈SP(s′|s,a)Vπ(s′)}
(11)
wherein V*(s) represents the optimal strategy π*The corresponding evaluation function is called as an optimal evaluation function; a represents a limited set of actions available to the drone; r (s, a) is the reward function value obtained by selecting the action a of the unmanned aerial vehicle in the state s; gamma is a discount factor, Vπ(s ') is the evaluation function of the state s' under the strategy pi;
Figure FDA0002280531590000041
an example of the state of the drone is shown,
Figure FDA0002280531590000042
representing the drone action instance, P (s' | s, a) is the state transfer function.
8. A multi-unmanned aerial vehicle collaborative route planning device based on an improved Markov decision process is characterized by comprising an unmanned aerial vehicle collaborative system modeling module, a Markov process model building module and a multi-unmanned aerial vehicle collaborative route planning module; the unmanned aerial vehicle collaborative system modeling module is used for determining starting points, task target points and radar threat areas of the multiple unmanned aerial vehicles in a flight scene according to the multiple unmanned aerial vehicle collaborative route planning task;
the Markov process model building module is used for determining all state spaces, action spaces and state transfer functions in the motion process of the multiple unmanned aerial vehicles according to the starting points, the task target points and the radar threat regions of the multiple unmanned aerial vehicles; constructing a Markov decision process model under a multi-unmanned-aerial-vehicle collaborative route planning task according to a state space, an action space, a state transfer function and a pre-constructed reward function in the Markov decision process;
the multi-unmanned aerial vehicle collaborative route planning module is used for executing search strategy iteration of a Markov decision process model based on a pre-constructed evaluation function, searching an action sequence enabling the evaluation function value to be maximum, and planning an optimal multi-unmanned aerial vehicle collaborative route.
9. The collaborative multi-drone collaborative route planning device according to claim 8, wherein the pre-constructed reward function R includes reward function R for normal drone drivingmovegoalAnd a reward function R when the unmanned aerial vehicle encounters a threatavoidobstacleExpressed as follows:
Figure FDA0002280531590000051
Figure FDA0002280531590000052
in the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle
10. the collaborative multi-drone collaborative route planning device according to claim 9, wherein the reward function further includes a radar threat reward function R while drones are travelingthreatExpressed as follows:
Figure FDA0002280531590000053
wherein L is the length of the navigation section after the unmanned aerial vehicle makes action decision, N is the number of radar threats, dk/4,iK is 1, 2 and 3 is the distance between k/4 points of the navigation road section and the ith radar threat;
in the state s, the obtained reward function R (s, a) of the action a is selected by the drone as follows:
R(s,a)=R(s,a)=Rmovegoal+Ravoidobstacle+Rthreat
CN201911139552.XA 2019-11-20 2019-11-20 Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process Pending CN112824998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911139552.XA CN112824998A (en) 2019-11-20 2019-11-20 Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911139552.XA CN112824998A (en) 2019-11-20 2019-11-20 Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process

Publications (1)

Publication Number Publication Date
CN112824998A true CN112824998A (en) 2021-05-21

Family

ID=75906673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911139552.XA Pending CN112824998A (en) 2019-11-20 2019-11-20 Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process

Country Status (1)

Country Link
CN (1) CN112824998A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703483A (en) * 2021-08-31 2021-11-26 湖南苍树航天科技有限公司 Multi-UAV collaborative trajectory planning method and system, equipment and storage medium
CN114200963A (en) * 2022-02-17 2022-03-18 佛山科学技术学院 Unmanned aerial vehicle autonomous mission planning method and device in dynamic environment and storage medium
CN114722946A (en) * 2022-04-12 2022-07-08 中国人民解放军国防科技大学 Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection
CN116882607A (en) * 2023-07-11 2023-10-13 中国人民解放军军事科学院系统工程研究院 Key node identification method based on path planning task
CN117075596A (en) * 2023-05-24 2023-11-17 陕西科技大学 Method and system for planning complex task path of robot under uncertain environment and motion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107883962A (en) * 2017-11-08 2018-04-06 南京航空航天大学 A kind of dynamic Route planner of multi-rotor unmanned aerial vehicle under three-dimensional environment
CN108594858A (en) * 2018-07-16 2018-09-28 河南大学 The unmanned plane searching method and device of Markov moving target

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107883962A (en) * 2017-11-08 2018-04-06 南京航空航天大学 A kind of dynamic Route planner of multi-rotor unmanned aerial vehicle under three-dimensional environment
CN108594858A (en) * 2018-07-16 2018-09-28 河南大学 The unmanned plane searching method and device of Markov moving target

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIANG YU等: "Collision-Free Trajectory Generation and Tracking for UAVs Using Markov Decision Process in a Cluttered Environment", 《JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS》 *
肖作林等: "多UCAV 协同任务规划技术研究", 《战术导弹技术》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703483A (en) * 2021-08-31 2021-11-26 湖南苍树航天科技有限公司 Multi-UAV collaborative trajectory planning method and system, equipment and storage medium
CN113703483B (en) * 2021-08-31 2024-04-09 湖南苍树航天科技有限公司 Multi-UAV collaborative trajectory planning method, system, equipment and storage medium
CN114200963A (en) * 2022-02-17 2022-03-18 佛山科学技术学院 Unmanned aerial vehicle autonomous mission planning method and device in dynamic environment and storage medium
CN114200963B (en) * 2022-02-17 2022-05-10 佛山科学技术学院 Unmanned aerial vehicle autonomous mission planning method and device under dynamic environment and storage medium
CN114722946A (en) * 2022-04-12 2022-07-08 中国人民解放军国防科技大学 Unmanned aerial vehicle asynchronous action and cooperation strategy synthesis method based on probability model detection
CN117075596A (en) * 2023-05-24 2023-11-17 陕西科技大学 Method and system for planning complex task path of robot under uncertain environment and motion
CN117075596B (en) * 2023-05-24 2024-04-26 陕西科技大学 Method and system for planning complex task path of robot under uncertain environment and motion
CN116882607A (en) * 2023-07-11 2023-10-13 中国人民解放军军事科学院系统工程研究院 Key node identification method based on path planning task
CN116882607B (en) * 2023-07-11 2024-02-02 中国人民解放军军事科学院系统工程研究院 Key node identification method based on path planning task

Similar Documents

Publication Publication Date Title
CN112824998A (en) Multi-unmanned aerial vehicle collaborative route planning method and device in Markov decision process
Zhang et al. Cooperative and geometric learning algorithm (CGLA) for path planning of UAVs with limited information
Duan et al. Imperialist competitive algorithm optimized artificial neural networks for UCAV global path planning
CN107168380B (en) Multi-step optimization method for coverage of unmanned aerial vehicle cluster area based on ant colony algorithm
CN101286071B (en) Multiple no-manned plane three-dimensional formation reconfiguration method based on particle swarm optimization and genetic algorithm
CN108153328B (en) Multi-missile collaborative track planning method based on segmented Bezier curve
CN109917806B (en) Unmanned aerial vehicle cluster formation control method based on non-inferior solution pigeon swarm optimization
CN106705970A (en) Multi-UAV(Unmanned Aerial Vehicle) cooperation path planning method based on ant colony algorithm
CN110687923A (en) Unmanned aerial vehicle long-distance tracking flight method, device, equipment and storage medium
Liu et al. Adaptive path planning for unmanned aerial vehicles based on bi-level programming and variable planning time interval
CN110908395A (en) Improved unmanned aerial vehicle flight path real-time planning method
CN102880186A (en) Flight path planning method based on sparse A* algorithm and genetic algorithm
CN112783213B (en) Multi-unmanned aerial vehicle cooperative wide-area moving target searching method based on hybrid mechanism
CN112947592A (en) Reentry vehicle trajectory planning method based on reinforcement learning
CN114740883B (en) Coordinated point reconnaissance task planning cross-layer joint optimization method
Saito et al. A LiDAR based mobile area decision method for TLS-DQN: improving control for AAV mobility
CN115562357A (en) Intelligent path planning method for unmanned aerial vehicle cluster
Zhong et al. Method of multi-UAVs cooperative search for Markov moving targets
Zhang et al. Design of the fruit fly optimization algorithm based path planner for UAV in 3D environments
Wei et al. UCAV formation online collaborative trajectory planning using hp adaptive pseudospectral method
Zhou et al. Multi-UAVs formation autonomous control method based on RQPSO-FSM-DMPC
Yu et al. UAV path planning using GSO-DE algorithm
Fu et al. Obstacle avoidance and collision avoidance of UAV swarm based on improved VFH algorithm and information sharing strategy
Zollars et al. Optimal Path Planning for SUAS Target Observation through Constrained Urban Environments using Simplex Methods
Li et al. A path planning for one UAV based on geometric algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210521

RJ01 Rejection of invention patent application after publication