CN114020013B - Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning - Google Patents
Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114020013B CN114020013B CN202111246299.5A CN202111246299A CN114020013B CN 114020013 B CN114020013 B CN 114020013B CN 202111246299 A CN202111246299 A CN 202111246299A CN 114020013 B CN114020013 B CN 114020013B
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- formation
- collision avoidance
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 83
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000002787 reinforcement Effects 0.000 title claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 24
- 230000006399 behavior Effects 0.000 claims abstract description 23
- 238000004088 simulation Methods 0.000 claims abstract description 7
- 238000013528 artificial neural network Methods 0.000 claims abstract description 4
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 37
- 239000013598 vector Substances 0.000 claims description 15
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 230000009471 action Effects 0.000 claims description 4
- 241000282414 Homo sapiens Species 0.000 claims description 3
- 230000003542 behavioural effect Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims description 3
- 230000035800 maturation Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 241000669618 Nothes Species 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 3
- 238000005755 formation reaction Methods 0.000 description 61
- 239000003795 chemical substances by application Substances 0.000 description 9
- 238000012423 maintenance Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/104—Simultaneous control of position or course in three dimensions specially adapted for aircraft involving a plurality of aircrafts, e.g. formation flying
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention provides an unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning, which comprises the following steps: the method comprises the steps that a strategy enabling the unmanned aerial vehicle to independently avoid collision and fly is output, and the unmanned aerial vehicle can keep formation by setting different constraint conditions; training the unmanned aerial vehicle in a simulation environment, generating a strategy based on collision avoidance behavior by selecting different behaviors to set different rewarding values, and recording various state information and collision avoidance strategies of the unmanned aerial vehicle; processing external environment information by adopting an LSTM mode in a cyclic neural network, and training on the basis of an initial strategy by combining with the state information of the unmanned aerial vehicle; different constraint conditions are added on the basis of collision avoidance, so that the unmanned aerial vehicle keeps a certain formation to fly on the basis of avoiding collision between teams, and the unmanned aerial vehicle is continuously operated and optimized through a model. The invention realizes the effective unification of collision avoidance and formation of the unmanned aerial vehicle, can effectively integrate resources, and can adjust the individual behaviors in real time to acquire the optimal collision avoidance behavior.
Description
Technical Field
The invention relates to the field of deep reinforcement learning and the technical field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning.
Background
In recent years, multi-agents have been increasingly studied due to their great potential in various fields. The related fields comprise collaborative exploration of monitoring and rescue, satellite cluster collaborative control, unmanned aerial vehicle formation control and the like. The basic concept of multi-agent systems is to solve complex tasks with individual collaboration, which are not accomplished by a single agent even if it has expensive equipment. Formation control is a fundamental problem of multi-agent systems, the goal of which is to achieve and maintain a certain formation shape that enables the multi-agent systems to jointly accomplish a specific task. Formation maintenance is an important issue in formation control. In addition, collision avoidance should also be considered in order to ensure the safety of the multi-agent system. Finding collision-free, time-efficient paths in an uncertain dynamic environment remains a challenge due to interactions between agents and trade-offs between collision avoidance and formation maintenance.
For the problem of formation maintenance, several formation control technologies are proposed in the research of other scholars, including formation control based on behavior, a virtual structure method and a formation control scheme based on a leader-follower architecture. Among these population control techniques, the leader-follower architecture is widely used with its simple structure and availability. While a series of efforts have been made in leader-follower formation control, in previous work, the problem of collision avoidance-based formation control has not been fully studied. Particularly in a dynamic environment, collisions between agents, and between multi-agent systems and obstacles, make collisions more and more difficult.
For the collision avoidance problem, conventional algorithms are generally classified into three types, including an offline planning method, an artificial potential field-based method, and a sensing method. The first offline planning method typically calculates a collision-free trajectory in advance and then uses the result as the desired trajectory for the subsequent tracking control system. However, these methods are computationally intensive. And the information of the whole environment is known in advance, so that the method is inconvenient to realize in a dynamic environment. The artificial potential field based approach avoids collisions by assuming virtual attractive and repulsive fields between individuals in the environment. However, there may be local minima, and sometimes there may be a problem that the destination is not reachable. The sensing and avoiding method solves the problem of collision avoidance by sensing the environment and correspondingly adjusting the current action, and has the characteristic of humanoid. Work on these methods can now be divided into two categories, reaction-based methods and prediction-based methods. The former avoids collisions by setting a walking rule based on the current state, such as a collision avoidance method based on fuzzy logic and a reciprocation speed obstacle method. However, these reaction-based approaches have limitations and may not be reliable in some cases because they do not take into account future conditions. The latter predicts the movement of the obstacle, predicts the future state, and then outputs a long-term action to avoid the collision. However, two problems are apparent-firstly, inaccurate estimation due to various uncertainties; the other is the enormous computational complexity of the prediction operation. Therefore, the traditional collision avoidance method has great limitation and does not have the capability of formation control, so that the gravity center of formation study collision avoidance is gradually shifted to the field of reinforcement learning.
Disclosure of Invention
Aiming at the problem of collision avoidance of a plurality of unmanned aerial vehicle formations, the invention provides an unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning, which is used for carrying out coordinated control on the whole unmanned aerial vehicle formation so as to achieve the purposes of avoiding collision and smoothly completing tasks.
In order to achieve the above object, the present invention adopts the following technical scheme:
an unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning comprises the following specific steps:
step one: the method comprises the steps of selecting a deep reinforcement learning model as a main body frame, setting initial parameters according to an industry maturation experiment, clearly training a strategy for enabling an unmanned aerial vehicle to independently avoid collision and fly, and enabling the unmanned aerial vehicle to keep formation by setting different constraint conditions on the basis;
step two: training the unmanned aerial vehicle in a simulation environment through simulation learning, enabling the unmanned aerial vehicle to simulate the selection behavior of human beings to operate, gradually generating a strategy based on collision avoidance behavior by selecting different behavior setting rewards, recording various state information and collision avoidance strategies of the unmanned aerial vehicle, and storing the state information and the collision avoidance strategies to a certain extent, wherein the state information and the collision avoidance strategies are used as input information of a subsequent learning model;
step three: the external environment information, mainly the state information of the obstacle, is processed by adopting an LSTM mode in the cyclic neural network, then the unmanned aerial vehicle is trained on the basis of an initial strategy by combining the state information of the unmanned aerial vehicle in the second step, and the speed of the unmanned aerial vehicle is adjusted by adopting a second-order dynamics model in the training process so as to obtain stable speed change, wherein the expected value of the training is that the unmanned aerial vehicle can reach a target position in a shorter path;
the Long short-term memory (LSTM) is a special RNN, and is mainly used for solving the problems of gradient elimination and gradient explosion in the Long sequence training process. In short, LSTM is able to perform better in longer sequences than normal RNNs.
Step four: different constraint conditions are added on the basis of collision avoidance, so that the unmanned aerial vehicle keeps a certain formation for flying on the basis of avoiding collision among teams, continuous running optimization is realized through a model, and a flexible flying strategy which keeps the formation and can return to a correct path after collision avoidance is expected to be output. In the first step, the environment comprises a leader, a follower and an obstacle, which are respectively represented by superscripts L, F and O;
the state space of the unmanned aerial vehicle at the time t is expressed as s t The behavioral space can be expressed as a t Other parameters in the training environment are: t represents the time, Δt represents the time step,indicating the position of the unmanned aerial vehicle at time t +.>The speed of the unmanned aerial vehicle at the time t is represented, r is the occupied radius, and p g =[p gx ,p gy ]Representing the target position, v pref To a desired speed, θ t For heading angle->For the state space of the follower +.>Status space for leader +.>Is a state space of an obstacle;
its state information s at time t t Represented asWherein->Representing state information that can be observed by the unmanned aerial vehicle; />Hidden state information which cannot be observed by the unmanned aerial vehicle is represented;
behavior a for unmanned aerial vehicle t Assuming that the unmanned aerial vehicle can quickly respond after receiving the control instruction, settingThe goal of training is to design the strategy pi of the follower: />To select appropriate actions to maintain formation and avoid obstacles;
in the learning architecture, an optimization problem is translated into an objective function, which is a form of multiple objective functions, and a set of constraints, the time t required for the follower to reach the target g And maintaining the accumulated error composition of the formation; meanwhile, the constraint condition also comprises a collision avoidance problem;
the objective function of formation collision avoidance is as follows:
in the formula (1.2), the amino acid sequence,representing other unmanned aerial vehicles in the environment that do not include followers, H t A desired relative offset vector representing the follower relative to the leader; (1.2) represents a constraint condition for avoiding collision, (1.3) represents a constraint condition for reaching a target site, and (1.4) represents a kinematic constraint of the unmanned aerial vehicle.
The second step specifically comprises the following steps:
first, define a joint state space of the unmanned aerial vehicleWherein->Representing the observable space of all followers, +.>Representing the observable space of the obstacle.
Secondly, a value network is designed to estimate the value of the state space, the purpose of the value network is to find the optimal value function, and the definition of the value function is as follows:
in the formula (1.5), the amino acid sequence,representing the rewards acquired at time t, gamma representing the discount factor;
for the optimal strategy pi * :Iterative acquisition from the value function:
in the formula (1.6)Representing the transition probability between time t and t + deltat.
Finally, solving the problem of formation control by adopting a formation evaluation function based on the idea of reinforcement learning, wherein the formation evaluation function is used for evaluating the quality of the formation and calculating the rewards of the formation and reflecting the errors of the formation track in real time; taking Euclidean distance between the target position and the actual position as an input; the constructed reward function for formation is defined as:
in the formula (1.7), the amino acid sequence,a formation error value formed at time t;
the reward function for collision avoidance is expressed as follows:
in the formula (1.8)Representing a minimum distance between the follower and the other drone;
combining equations (1.7) and (1.8) to obtain a complete bonus function R t The method comprises the following steps:
in the third step, the second-order power model is as follows:
in the formula (1.10), the amino acid sequence,and->Representing the position, speed and control input vector of the follower, respectively; in contrast, the +.>And->A position and velocity vector representing the leader; the follower should keep a certain distance from the leader according to the formation to be maintained, H p =[H x ,H y ] T Representing the relative offset vector that needs to be maintained with respect to the leader follower;
suppose ζ F =[(P F ) T ,(V F ) T ] T Indicating the position and speed of the follower, ζ L =[(P L ) T ,(V L ) T ] T Representing the position and speed of the leader, the relative offset vector of the two isThe conditions for maintaining formation for any given initial state follower and leader are:
a control protocol is assumed according to the control conditions, wherein k 1 ,k 2 >0:
Compared with the prior art, the invention has the advantages that:
the collision avoidance behavior of the unmanned aerial vehicle is difficult to be effectively unified with the formation behavior of the unmanned aerial vehicle, the traditional collision avoidance mode lacks flexibility and cannot be flexibly integrated with the formation system of the unmanned aerial vehicle, most of traditional formation control is based on a control theory, is more prone to fixed and unchanged movement tasks, and lacks dynamic adjustment to dynamic environments. Therefore, from the viewpoint of the field of deep reinforcement learning, the unmanned aerial vehicle formation collision avoidance method based on the deep reinforcement learning is provided, so that the unmanned aerial vehicle collision avoidance and formation are effectively unified, resources can be effectively integrated, the behaviors of individuals can be adjusted in real time to obtain the optimal collision avoidance behavior, and the mobility of a cluster system and the adaptability to a complex environment are greatly improved.
Drawings
FIG. 1 is a general idea of the formation collision avoidance of the present invention;
FIG. 2 is a flow chart of processing data by the LSTM module of the present invention;
fig. 3 is a general algorithm block diagram of the formation collision avoidance of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the accompanying drawings and by way of examples in order to make the objects, technical solutions and advantages of the invention more apparent.
The unmanned aerial vehicle formation collision avoidance method for deep reinforcement learning comprises the following specific steps:
step one: firstly, a deep reinforcement learning model is selected as a main body frame, then initial parameters are set according to industry maturation experiments, a strategy enabling the unmanned aerial vehicle to independently avoid collision and fly is explicitly output, and on the basis, the unmanned aerial vehicle can keep formation to a certain extent by setting different constraint conditions. The environment of the invention includes a leader, a follower and an obstacle, and is divided by superscripts F, L, O for convenience of distinction.
The state space of the unmanned aerial vehicle at time t can be expressed as s t The behavioral space can be expressed as a t Other parameters in the training environment are shown in table 1.
TABLE 1 parameter details list
Various information of the unmanned aerial vehicle in the operation process can be clarified through the table 1, and the state information s of the unmanned aerial vehicle at the time t t Can be expressed asWherein->Representing state information that can be observed by the unmanned aerial vehicle;and the hidden state information which cannot be observed by the unmanned aerial vehicle is indicated.
Behavior a for unmanned aerial vehicle t The unmanned aerial vehicle is assumed to respond quickly after receiving the control instruction, thus settingThe goal of training is to design the strategy pi:. About.>To select appropriate actions to maintain formation and avoid obstacles.
In a learning architecture, this problem can be translated into an optimization problem of an objective function, which is a form of a multiple objective function, and a set of constraints, the time t required for the follower to reach the target g And maintaining the accumulated error composition of the formation. At the same time, the constraint condition also comprises a collision avoidance problem.
Thus, the objective function of formation collision avoidance is as follows:
in the formula (1.2), the amino acid sequence,representing other unmanned aerial vehicles in the environment that do not include followers, H t Representing the expected relative offset vector of the follower with respect to the leader. (1.2) represents a constraint condition for avoiding collision, (1.3) represents a constraint condition for reaching a target site, and (1.4) represents a kinematic constraint of the unmanned aerial vehicle.
Step two: the unmanned aerial vehicle is trained in a simulation environment through simulation learning, so that the unmanned aerial vehicle simulates the selection behavior of human beings to operate, a strategy based on collision avoidance behavior is gradually generated by selecting different behavior setting reward values, then various state information and collision avoidance strategies of the unmanned aerial vehicle are recorded, certain storage is carried out, and the state information and the collision avoidance strategies are used as input information of a subsequent learning model.
First defining a joint state space of a unmanned aerial vehicleWherein->Representing the observable space of all followers, +.>Representing the observable space of the obstacle.
Secondly, a value network is designed to estimate the value of the state space, the purpose of the value network is to find the optimal value function, and the definition of the value function is as follows:
in the formula (1.5), the amino acid sequence,indicating the rewards acquired at time t, gamma indicating the discount factor.
For the optimal strategy pi * :The iterative acquisition may be performed from a value function. For example:
in the formula (1.6)Representing the transition probability between time t and t + deltat.
Finally, the problem of formation control is solved by adopting a formation evaluation function based on the idea of reinforcement learning, and the essence of the problem is that the method is used for evaluating the quality of the formation and calculating the rewards of the formation, and particularly can reflect errors of formation tracks in real time. The euclidean distance between the target position and the actual position is taken as an input. The constructed reward function for formation is defined as:
in the formula (1.7), the amino acid sequence,indicating the formation error value formed at time t. While the reward function for collision avoidance is expressed as follows:
in the formula (1.8)Representing the minimum distance between the follower and the other drone. The complete bonus function obtained by combining equation (1.7) and equation (1.8) is:
the idea of combining collision avoidance and clustering can be considered that unmanned aerial vehicle clusters are collision avoidance is shown in fig. 1.
Step three: and thirdly, processing external environment information, mainly the state information of the obstacle, by adopting an LSTM mode in the cyclic neural network, and training on the basis of an initial strategy by combining the state information of the unmanned aerial vehicle in the second step, wherein the speed of the unmanned aerial vehicle is adjusted by adopting a second-order dynamics model in the training process so as to acquire stable speed change, and the expected value of training is that the unmanned aerial vehicle can reach a target position in a shorter path.
At time t, the state of the obstacle is regarded as an input sequence of the LSTM network, and the LSTM network processes the state information of the obstacle one by one as shown in fig. 2, and finally generates the coding information of all the obstacles. The LSTM network can solve the problem of uncertain number of obstacles, thereby avoiding updating the network due to changing number of obstacles.
The second order power model selected is as follows:
in the formula (1.10), the amino acid sequence,and->Representing the position, velocity and control input vector of the follower, respectively. In contrast, the +.>And->Representing the position and velocity vectors of the leader. The follower should be kept at a distance from the leader according to the formation to be maintained, thus H p =[H x ,H y ] T Representing the relative offset vector that needs to be maintained with respect to the leader follower.
Suppose ζ F =[(P F ) T ,(V F ) T ] T Indicating the position and speed of the follower, ζ L =[(P L ) T ,(V L ) T ] T Representing the position and speed of the leader, the relative offset vector of the two isThe conditions for maintaining formation for any given initial state follower and leader are as follows:
depending on the control conditions, the following control protocol (where k 1 ,k 2 >0):
Step four: finally, different constraint conditions are added on the basis of collision avoidance, so that the unmanned aerial vehicle keeps a certain formation for flying on the basis of avoiding collision among teams, and continuous running optimization is carried out through a model, and a flexible flying strategy which keeps the formation and can return to a correct path after collision avoidance is expected to be output. .
The main body frame of the unmanned aerial vehicle formation collision avoidance algorithm is shown in fig. 3.
The training process in the steps can be subdivided into:
(1) Executing an optimal reciprocal collision avoidance strategy algorithm based on formation, and collecting a demonstration data setThen go to (2).
(2) By presenting data setsThe value network V is initialized and then goes to (3).
(3) Initializing a target value networkThen go to (4).
(4) Initializing experience playback memoryFor breaking the association between the data, and then go to (5).
(5) And (4) performing circulation, if the maximum execution times are not reached, executing the following steps, otherwise, exiting, and returning to the cost function V.
a) Initializing random training spaces
b) The following steps are repeatedly performed until success or timeout:
i.A selection behavior
Selecting optimal behavior by maximizing the sum of the cost function and the feedback function
Preserving tuplesInto experience playback memory M
Randomly extracting training lots from M
v. set the desired value of the output
Performing gradient descent based on a cost function
c) Updating target value network once every training C times
d) Ending, returning the cost function V
Those of ordinary skill in the art will appreciate that the embodiments described herein are intended to aid the reader in understanding the practice of the invention and that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.
Claims (1)
1. The unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning is characterized by comprising the following steps of:
step one: the method comprises the steps of selecting a deep reinforcement learning model as a main body frame, setting initial parameters according to an industry maturation experiment, clearly training a strategy for enabling an unmanned aerial vehicle to independently avoid collision and fly, and enabling the unmanned aerial vehicle to keep formation by setting different constraint conditions on the basis;
in the first step, the environment comprises a leader, a follower and an obstacle, which are respectively represented by superscripts L, F and O;
the state space of the unmanned aerial vehicle at the time t is expressed as s t The behavioral space can be expressed as a t Other parameters in the training environment are: t represents the time, Δt represents the time step,indicating the position of the unmanned aerial vehicle at time t +.>The speed of the unmanned aerial vehicle at the time t is represented, r is the occupied radius, and p g =[p gx ,p gy ]Representing the target position, v pref To a desired speed, θ t For heading angle->For the state space of the follower +.>Status space for leader +.>Is a state space of an obstacle;
its state information s at time t t Represented asWherein->Representing state information that can be observed by the unmanned aerial vehicle; />Hidden state information which cannot be observed by the unmanned aerial vehicle is represented;
behavior a for unmanned aerial vehicle t Assuming noThe man-machine can quickly respond and set after receiving the control instructionThe goal of training is to design the strategy pi:. About.>To select appropriate actions to maintain formation and avoid obstacles;
in the learning architecture, an optimization problem is translated into an objective function, which is a form of multiple objective functions, and a set of constraints, the time t required for the follower to reach the target g And maintaining the accumulated error composition of the formation; meanwhile, the constraint condition also comprises a collision avoidance problem;
the objective function of formation collision avoidance is as follows:
in the formula (1.2), the amino acid sequence,representing other unmanned aerial vehicles in the environment that do not include followers, H t A desired relative offset vector representing the follower relative to the leader; (1.2) a constraint for avoiding collision,(1.3) representing constraints on reaching the target site and (1.4) representing kinematic constraints of the drone;
step two: training the unmanned aerial vehicle in a simulation environment through simulation learning, enabling the unmanned aerial vehicle to simulate the selection behavior of human beings to operate, gradually generating a strategy based on collision avoidance behavior by selecting different behavior setting rewards, recording various state information and collision avoidance strategies of the unmanned aerial vehicle, and storing the state information and the collision avoidance strategies to a certain extent, wherein the state information and the collision avoidance strategies are used as input information of a subsequent learning model;
the second step specifically comprises the following steps:
first, define a joint state space of the unmanned aerial vehicleWherein->Representing the observable space of all followers, +.>An observable space representing an obstacle;
secondly, a value network is designed to estimate the value of the state space, the purpose of the value network is to find the optimal value function, and the definition of the value function is as follows:
in the formula (1.5), the amino acid sequence,representing the rewards acquired at time t, gamma representing the discount factor;
for the optimal strategy pi * :Iterative acquisition from the value function:
in the formula (1.6)Representing the transition probability between time t and t+Δt;
finally, solving the problem of formation control by adopting a formation evaluation function based on the idea of reinforcement learning, wherein the formation evaluation function is used for evaluating the quality of the formation and calculating the rewards of the formation and reflecting the errors of the formation track in real time; taking Euclidean distance between the target position and the actual position as an input; the constructed reward function for formation is defined as:
in the formula (1.7), the amino acid sequence,a formation error value formed at time t;
the reward function for collision avoidance is expressed as follows:
in the formula (1.8)Representing a minimum distance between the follower and the other drone;
combining equations (1.7) and (1.8) to obtain a complete bonus function R t The method comprises the following steps:
step three: the external environment information is processed by adopting an LSTM mode in the cyclic neural network, then the unmanned aerial vehicle is trained on the basis of an initial strategy by combining with the state information of the unmanned aerial vehicle in the second step, and the speed of the unmanned aerial vehicle is adjusted by adopting a second-order dynamics model in the training process so as to obtain stable speed change, wherein the expected value of the training is that the unmanned aerial vehicle can reach a target position in a shorter path;
in the third step, the second order dynamics model is as follows:
in the formula (1.10), the amino acid sequence,and->Representing the position, speed and control input vector of the follower, respectively; in contrast, the +.>And->A position and velocity vector representing the leader; the follower should keep a certain distance from the leader according to the formation to be maintained, H p =[H x ,H y ] T Representing the relative offset vector that needs to be maintained with respect to the leader follower;
suppose ζ F =[(P F ) T ,(V F ) T ] T Indicating the position and speed of the follower, ζ L =[(P L ) T ,(V L ) T ] T Representing the position and speed of the leader, the relative offset vector of the two isThe conditions for maintaining formation for any given initial state follower and leader are:
a control protocol is assumed according to the control conditions, wherein k 1 ,k 2 >0:(1.12);
Step four: different constraint conditions are added on the basis of collision avoidance, so that the unmanned aerial vehicle keeps a certain formation for flying on the basis of avoiding collision among teams, continuous running optimization is realized through a model, and a flexible flying strategy which keeps the formation and can return to a correct path after collision avoidance is expected to be output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111246299.5A CN114020013B (en) | 2021-10-26 | 2021-10-26 | Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111246299.5A CN114020013B (en) | 2021-10-26 | 2021-10-26 | Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114020013A CN114020013A (en) | 2022-02-08 |
CN114020013B true CN114020013B (en) | 2024-03-15 |
Family
ID=80057596
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111246299.5A Active CN114020013B (en) | 2021-10-26 | 2021-10-26 | Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114020013B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114815882A (en) * | 2022-04-08 | 2022-07-29 | 北京航空航天大学 | Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning |
CN116069023B (en) * | 2022-12-20 | 2024-02-23 | 南京航空航天大学 | Multi-unmanned vehicle formation control method and system based on deep reinforcement learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
WO2020056875A1 (en) * | 2018-09-20 | 2020-03-26 | 初速度(苏州)科技有限公司 | Parking strategy based on deep reinforcement learning |
CN111694365A (en) * | 2020-07-01 | 2020-09-22 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
JP2021034050A (en) * | 2019-08-21 | 2021-03-01 | 哈爾浜工程大学 | Auv action plan and operation control method based on reinforcement learning |
CN113110592A (en) * | 2021-04-23 | 2021-07-13 | 南京大学 | Unmanned aerial vehicle obstacle avoidance and path planning method |
CN113253733A (en) * | 2021-06-03 | 2021-08-13 | 杭州未名信科科技有限公司 | Navigation obstacle avoidance method, device and system based on learning and fusion |
CN113485323A (en) * | 2021-06-11 | 2021-10-08 | 同济大学 | Flexible formation method for cascaded multiple mobile robots |
-
2021
- 2021-10-26 CN CN202111246299.5A patent/CN114020013B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020056875A1 (en) * | 2018-09-20 | 2020-03-26 | 初速度(苏州)科技有限公司 | Parking strategy based on deep reinforcement learning |
CN109992000A (en) * | 2019-04-04 | 2019-07-09 | 北京航空航天大学 | A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning |
JP2021034050A (en) * | 2019-08-21 | 2021-03-01 | 哈爾浜工程大学 | Auv action plan and operation control method based on reinforcement learning |
CN111694365A (en) * | 2020-07-01 | 2020-09-22 | 武汉理工大学 | Unmanned ship formation path tracking method based on deep reinforcement learning |
CN113110592A (en) * | 2021-04-23 | 2021-07-13 | 南京大学 | Unmanned aerial vehicle obstacle avoidance and path planning method |
CN113253733A (en) * | 2021-06-03 | 2021-08-13 | 杭州未名信科科技有限公司 | Navigation obstacle avoidance method, device and system based on learning and fusion |
CN113485323A (en) * | 2021-06-11 | 2021-10-08 | 同济大学 | Flexible formation method for cascaded multiple mobile robots |
Also Published As
Publication number | Publication date |
---|---|
CN114020013A (en) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jesus et al. | Deep deterministic policy gradient for navigation of mobile robots in simulated environments | |
de Jesus et al. | Soft actor-critic for navigation of mobile robots | |
Konidaris et al. | Constructing skill trees for reinforcement learning agents from demonstration trajectories | |
CN114020013B (en) | Unmanned aerial vehicle formation collision avoidance method based on deep reinforcement learning | |
US11561544B2 (en) | Indoor monocular navigation method based on cross-sensor transfer learning and system thereof | |
Çatal et al. | Learning perception and planning with deep active inference | |
Eiffert et al. | Path planning in dynamic environments using generative rnns and monte carlo tree search | |
Botteghi et al. | On reward shaping for mobile robot navigation: A reinforcement learning and SLAM based approach | |
Strasdat et al. | Which landmark is useful? Learning selection policies for navigation in unknown environments | |
Fan et al. | Learning resilient behaviors for navigation under uncertainty | |
CN113391633A (en) | Urban environment-oriented mobile robot fusion path planning method | |
Manh et al. | Autonomous navigation for omnidirectional robot based on deep reinforcement learning | |
Ahmad et al. | End-to-end probabilistic depth perception and 3d obstacle avoidance using pomdp | |
Salvatore et al. | A neuro-inspired approach to intelligent collision avoidance and navigation | |
Lee et al. | Bayesian Residual Policy Optimization:: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts | |
Sharma et al. | Model based path planning using Q-Learning | |
Doellinger et al. | Environment-aware multi-target tracking of pedestrians | |
CN114396949B (en) | DDPG-based mobile robot apriori-free map navigation decision-making method | |
Ustun et al. | Controlling synthetic characters in simulations: a case for cognitive architectures and sigma | |
CN115542733A (en) | Self-adaptive dynamic window method based on deep reinforcement learning | |
Nguyen et al. | Cumulative training and transfer learning for multi-robots collision-free navigation problems | |
Botteghi et al. | Entropy-based exploration for mobile robot navigation: a learning-based approach | |
Tang et al. | Reinforcement learning for robots path planning with rule-based shallow-trial | |
Boborzi et al. | Learning normalizing flow policies based on highway demonstrations | |
Truong et al. | An efficient navigation framework for autonomous mobile robots in dynamic environments using learning algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |